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ABSTRACT 


In  the  Northern  Hemisphere  there  are  more  than  five  times  as  many  stations 
reporting  surface- synoptic  data  as  there  arc  reporting  radiosonde  observations. 

A  procedure  has  been  developed  to  selectively  diagnose  upper-air  humidity  from 
surface  observations  and  to  utilize  both  diagnostic  and  radiosonde  data  in  an  objec¬ 
tive  analysis  of  dew-point  spread  using  the  sueeessive-approximation  technique. 

Northern-hemisphere  surface-synoptic  and  radiosonde  data  from  August 
through  Oetobcr  1964  arc  used  to  develop  diagnostic  relationships  between  surface- 
observed  variables  at  a  single  station  and  the  dew-point  spread  at  the  850- ,  700- , 
500- ,  and  400-mb  levels  above  that  station  for  the  warm  season  of  the  year.  The 
approach  consists  of  two  steps:  (1)  the  isolation  within  a  dccision-trce  framework 
of  those  cases  for  which  individual  surfaee-observed  variables  yield  highly  reliable 
estimates  of  upper-level  humidity,  and  (2)  the  application  of  a  statistical  technique 
(Regression  Estimation  of  Event  Probabilities)  to  the  remaining  cases  to  derive 
equations  yielding  probabilities  of  occurrence  of  specified  categories  of  dew-point 
spread.  This  approach  yields  useful  diagnostic  information  of  variable  quality. 

The  incorporation  of  diagnostic  data  obtained  from  the  cold  season  relation¬ 
ships  (derived  in  earlier  work)  into  a  humidity  analysis  at  the  850- ,  700-  and  500-mb 
levels  is  tested  using  European  surface  and  upper-air  data  for  22  observation  times 
in  February  1962.  Sparse  data  conditions  are  simulated  by  withholding  a  portion  of 
both  surface  and  upper-air  data. 

Rms  errors  and  contingency  table  percent  correct  scores  indicate  that  an 
improved  analysis  is  obtained  by  weighting  the  diagnostic  data  relative  to  the 
radiosonde  data.  The  most  appropriate  weighting  is  a  function  of  the  reliability  of 
the  diagnosis  and  the  data  density. 
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SECTION  I 


INTRODUCTION 


The  distribution  of  radiosonde  stations  in  the  Northern  Hemisphere  is  very 
uneven:  few  observations  of  upper -level  moisture  are  available  over  oceans  and 
sparsely-inhabited  land  areas.  Therefore,  upper-air  humidity  must  be  inferred 
from  whatever  other  observational  information  is  available .  It  might  be  obtained 
from  routine  surface-synoptic  observations,  provided  reliable  relationships  be¬ 
tween  upper-level  moisture  and  surface -observed  variables  can  be  uncovered. 
Then,  the  diagnostic  information  must  be  combined  with  radiosonde  data  to  develop 
an  optimum  depiction  of  the  initial -state  moisture  field.  A  reliable  humidity 
analysis  has  many  uses,  an  obvious  one  being  as  a  source  of  information  for  cloud 
prediction . 

The  development  of  techniques  to  diagnose  and  predict  moisture  and  clouds 
has  been  pursued  by  many  investigators.  A  discussion  of  previous  research  is 
given  in  an  earlier  planning  report  [l]. 

In  the  first  phase  of  the  work  undertaken  by  the  authors,  diagnostic  relation¬ 
ships  were  developed  between  850-,  700-,  500-,  and  400-mb  dew-point  spread 
and  surface -observed  variables  for  the  cold  season  of  the  year  (using  December, 
January  and  February  data).  This  work  is  reported  in  [2].  The  second  phase, 
reported  here,  was  to  develop  diagnostic  relationships  for  the  warm  season  of  the 
year  (using  August,  September  and  October  data).  The  diagnostic  approach  used 
for  both  phases  of  the  study  consisted  of  isolating,  within  a  decision-tree  frame¬ 
work,  those  diagnostic  relations  from  which  a  highly  reliable  estimate  of  moisture 
could  be  made,  until  a  point  was  reached  at  which  the  number  of  direct  high- 
quality  relations  had  been  exhausted.  The  remaining  cases  (residual  sample) 
were  then  investigated  using  a  statistical  technique  called  Regression  Estimation 
of  Event  Probabilities  (REEP)  [lO],  The  diagnostic  relations  derived  with  REEP 
are  useful  but  of  variable  quality.  Therefore,  an  objective  analysis  procedure 
was  developed  to  utilize  the  diagnostic  information  selectively.  A  series  of  tests 
were  conducted  with  the  Successive -Approximation  Technique  (SAT)  analysis 
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procedure  to  determine  the  best  means  of  combining  the  observed  and  diagnosed 
data  under  varying  conditions  of  data  density  and  distribution.  The  analysis  area 
was  limited  to  Europe;  there,  progressively  sparser  data  conditions  were  simu¬ 
lated  by  withholding  both  radiosonde  and  surface  data. 
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SECTION  n 


DATA  PROCESSING 


Two  data  samples  were  used  in  the  diagnostic  and  analysis  developmental 
work.  Warm-season  diagnostic  relationships  were  developed  with  Northern 
Hemispheric  surface -synoptic  observations  and  upper-air  soundings  collected  by 
the  United  States  Weather  Bureau  (USWB)  at  the  National  Meteorological  Center 
(NMC)  in  Suitland,  Maryland.  The  analysis  technique  was  developed  and  tested 
with  radiosonde  and  surface  data  gathered  at  the  3rd  Weather  Wing,  Global  Weather 
Central  (GWC)  at  Offutt  AFB,  Nebraska.  The  data  collected  at  GWC  were  used 
earlier  to  develop  the  cold-season  diagnostic  relationships  [2J. 

1.  NMC  Data 

Upper-air  soundings  (radiosondes)  and  surface-synoptic  observations  were 
recorded  on  magnetic  tape  twice  daily  (00Z  and  12Z)  by  the  USWB  for  the  period 
August  25  through  November  14,  1964.  For  portions  of  this  period  (the  most 
lengthy  was  October  9 — 18)  either  surface  or  radiosonde  data  were  missing. 

Because  the  sample  was  to  be  used  to  develop  warm-season  diagnostic 
relationships,  selective  processing  was  required  to  eliminate  cases  completely 
unrepresentative  of  the  warm  portion  of  the  year.  As  a  consequence,  November 
data  were  not  processed.  Further,  stations  from  specific  blocks  were  eliminated 
from  the  remainder  of  the  sample  (see  Fig.  1).  Because  of  these  limitations  and 
the  data  sample  characteristics,  only  14370  cases  were  processed,  considerably 
fewer  than  the  cold  season  sample.  In  the  decision -tree  approach,  the  surface 
variables  are  examined  individually.  We  felt  that  the  entire  sample  was  required 
for  developmental  work;  thus  we  did  not  set  any  portion  of  the  sample  aside  as 
independent  data  for  testing  the  derived  warm-season  decision-tree  relationships. 

It  has  been  shown  [2]  that  the  cold  season  decision-tree  relationships  were  quite 
stable.  However,  limited  testing  of  the  warm -season  decision-tree  relationships 
was  made  with  hand  collected  data. 

Processing  the  data  consisted  of  extracting  the  required  surface  and  upper- 
air  information  from  multiple  basic  data  tapes ,  manipulating  it  into  a  form 
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Fig.  1.  Northern  Hemisphere  areas  not  used  in  diagnostic 
study  (stipple  pattern).  Screened  areas  were  omitted  only  in 
October. 
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suitable  for  evaluation,  and  merging  it  into  final  form  on  one  data  tape.  Consider¬ 
able  data  manipulation  was  required,  because  the  upper-air  data  were  listed  in 
seven  consecutive  00Z  observation  times  followed  by  seven  consecutive  12Z  obser¬ 
vation  times,  while  the  surface  data  were  listed  by  alternating  00Z  and  12Z  times. 
Table  I  lists  the  variables  that  were  considered  in  the  decision-tree  and  statistical 
evaluations. 


TABLE  I 

SURFACE  VARIABLES  USED  IN  DIAGNOSTIC  STUDIES 


Variable 

Usage 

Name 

Symbol 

Units 

Decision 

tree 

Statistical 

Wind  direction 

DD 

deg. 

no 

yes 

Wind  speed 

FF 

knot 

no 

yes 

Pressure 

P 

mb 

no 

yes 

Temperature 

T 

°C 

no 

yes 

Dew  point 

L 

°C 

no 

yes 

Dew-point  spread 

DPS 

°c 

no 

yes 

Visibility 

W 

mi 

no 

yes 

Present  weather 

ww 

— 

yes 

yes 

Past  weather 

W 

— 

yes 

yes 

Total  cloud  amount 

nt 

— 

yes 

yes 

Low  cloud  amount 

Nh 

— 

yes 

yes 

Low  cloud  height 

h 

— 

yes 

yes 

Low  cloud  type 

CL 

— 

yes 

yes 

Middle  cloud  type 

CM 

— 

yes 

yes 

High  cloud  type 

CH 

— 

yes 

yes 

Pressure  change 

app 

— 

yes 

yes 

5 


Because  of  analysis  requirements  at  NMC,  a  statistical  value  of  dew-point 
spread  (DPS)  was  inserted  in  the  upper-air  data  when  a  "motor boating"  condition^ 
oeeurred.  The  statistical  value  used  was  obtained  from  Manual  for  Radiosonde 
Code  [6j.  The  statistical  values  of  dew-point  spread  vary  from  27°C,  for  a  tem¬ 
perature  of  20°C,  to  10°C  for  a  temperature  of  -40°C.  The  basic  upper-air  data 
were  sueh  that  it  was  impossible  to  differentiate  between  a  ealeulated  and  an 
observed  dew  point.  Because  the  statistically -derived  dew-point  spread  at  400  mb 
generally  was  within  the  range  of  10°C  to  1G°C,  it  was  felt  that  the  resultant 
deeision-tree  and  REEP  relationships  would  be  biased  by  the  grouping  within  the 
same  dew-point  spread  interval  of  very  dry  (motorboating)  eases  with  cases  not 
nearly  so  dry.  At  the  850-,  700-,  and  500-mb  levels  the  statistically -derived 
dew-point  spread  is  usually  greater  than  15°C  for  the  warm  season  sample,  with 
the  result  that  the  development  of  the  deeision-tree  and  statistical  relationships 
wras  not  hindered. 

2.  Offutt  Data 

Data  processing  required  for  the  analysis  developmental  testing  is  described 
briefly  below.  A  more  detailed  description  of  specific  computer  program  functions 
and  analysis  technique  testing  is  given  in  later  sections.  The  processing  of  the 
Northern  Hemispheric  surface-synoptie  and  upper-air  stations  and  condensation 
pressure  spread  (CPS)  grid-point  data  was  limited  to  the  area  defined  in  Fig.  2, 
for  the  time  period  00Z  Feb.  G — 12Z  Feb.  16,  19G2  (22  observation  times).  This 
time  period  wTas  selected  because  all  three  types  of  data  w'ere  available.  The  area 
shown  in  Fig.  2,  whieh  includes  most  of  Europe,  was  ehosen  for  its  high  density  of 
surface  and  upper-air  stations. 

The  following  data  processing  steps  (See  Fig.  3)  were  required: 

(a)  Surfaee-synoptie  station  data  were  extracted  for  the  area  and  obser¬ 
vation  times  of  interest.  An  attempt  was  made  to  diagnose  dew-point  spread 
at  850-,  700-,  and  500-mb  at  eaeh  station  using  the  deeision-tree  relationships 
and  REEP  equations. 

'''"Motorboating"  is  the  term  used  to  describe  the  audio  signal,  transmitted  by 
the  radiosonde  humidity  element,  whieh  is  so  low  in  frequency  that  it  resembles  the 
sound  of  a  motorboat.  The  humidity  eontent  at  a  given  temperature  vai’ies  direetly 
with  the  frequency  of  the  signal;  thus  a  very  low  frequency  corresponds  to  a  low7 
humidity,  whieh  eannot  be  measured  aeeurately. 
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Fig.  2.  Area  (shaded)  used  in  humidity  analysis  developmental  tests;  dashed  rectangle  encloses  verification  area. 


Fig.  3.  Data  processing  steps;  computer  program  titles  are 
underlined. 
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(b)  Radiosonde  station  data  were  extracted  for  the  same  area  and  obser¬ 
vation  times.  At  any  level  (850,  700,  and  500  mb),  when  the  temperature 

and  height  were  reported  and  the  dew' -point  was  missing,  motorboating  was 
assumed  and  a  dew-point  spread  of  20°C  inserted.  All  station  reports  of 
DPS  were  error-checked  by  comparison  with  the  average  values  of  DPS  at 
stations  in  the  vicinity,  a  reasonable  difference  being  allowed. 

(c)  Radiosonde  and  diagnostic  information  were  combined  and  a  variety 
of  data  processing  functions  performed;  a  measure  of  station  data  density  and 
distribution  was  obtained  and  data  was  withheld. 

(d)  Required  CPS  grid-point  data  were  extracted  and  converted  to  DPS. 

CPS  is  a  parameter  developed  at  the  Scientific  Services  Section  of  the  USAF 

2 

Air  Weather  Service  for  use  in  a  cloud  prediction  model.  CPS  is  defined  as 
the  pressure  difference  p  -  p^,  where  p  is  the  pressure  of  an  air  parcel  be¬ 
fore  lifting  and  p  is  pressure  of  the  same  parcel  after  condensation  has 
c 

occurred  by  dry-adiabatic  lifting.  The  expression  for  CPS  is 


CPS  =  p  -p  =  p 
c 


1  - 


y(T  -  Td) 


— .  c  /R 


T(y  -  y  ) 

m_ 


-1 


(II-l) 


where  y  and  are  the  corresponding  dry-adiabatic  and  mixing-ratio  lapse 

rates,  T  and  T  ,  are  the  initial  temperature  and  dew  point  of  the  air  parcel, 
d 

c  is  the  specific  heat  at  constant  pressure,  and  R  is  the  gas  constant  for  dry 
P 

air.  A  CPS  value  of  -100  mb  corresponds  approximately  to  a  DPS  of  10°C. 

(e)  Radiosonde  and  diagnostic  station  data,  together  with  the  DPS  grid- 
point  data,  were  then  input  to  the  SAT  (successive-approximation  technique) 
DPS  analysis  program  in  various  combinations  for  developmental  testing  of 
the  analysis  technique. 


2 

Major  Earl  Kindle,  USAF  Retired,  provided  the  description  "AWS  Cloud 
Forecasting  Program"  from  which  this  information  was  extracted. 
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SECTION  III 


DECISION-TREE  TECHNIQUE 

The  objective  of  the  decision -tree  phase  of  the  study  was  to  select  from 

suiface-observed  data,  variables  with  a  high  correlation  to  moisture  content  at  the 

3 

(mandatory)  850-,  700-,  and  500-mb  constant-pressure  levels'  to  acquire  addi¬ 
tional,  reliable  estimates  of  humidity  which  could  be  used  in  an  objective  moisture 
analysis.  Thus,  we  attempted  to  find  the  combination  of  surface-obseiwed  variables 
that  yields  the  greatest  number  of  reliable  estimates  of  moisture  content.  There 
is  a  large  number  of  combinations:  We  selected  the  several  most  promising  by  an 
initial  examination  of  DPS  histograms  of  the  variables. 

We  examined  all  types  of  each  variable  (listed  below)  to  determine  which  were 
best  related  to  upper-level  dew-point  spread  (DPS)  at  each  level.  This  was  done  by 
developing  a  histogram  of  each  type  of  each  variable  and  evaluating  the  frequency 
distribution  of  DPS  for  each.  The  variables  considered  in  the  decision-tree  phase  of 
the  study  were:  present  weather  (ww);  past  weather  (W);  cloud  type  (C  -  low, 

L 

C  =  middle,  and  C  =  high);  cloud  amount  (N  =  low,  and  N  =  total);  low-cloud 
M  H  hi 

height  (h);  and  3-hr  pressure  change  (app).  Table  II  contains  the  abridged  descrip¬ 
tions  of  the  100  present-weather  types  and  Table  III  contains  the  abridged  descrip¬ 
tions  of  past-weather  and  low,  middle,  and  high  cloud  types  [?]  .  This  information 
was  taken  from  a  USWB  Daily  Weather  Map. 

The  evaluation  of  the  histograms  consisted  of  isolating  those  types  of  each 
variable  in  which  a  pre-defined  percentage  of  the  cases  (threshhold  value)  fell  within 
certain  limiting  values  of  DPS.  The  determination  of  the  threshhold  values  at  each 
of  the  pressure  levels  was  based  on  subjective  considerations  [2]  . 

Before  the  decision  trees  at  the  3  levels  were  developed,  the  full  sample  of 
14,370  cases  was  tabulated  to  determine: 


Diagnostic  relationships  for  400  mb  were  not  developed  for  reasons  axplained 
in  Section  II. 
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TABLE  II 

ABRIDGED  DESCRIPTION  OF  PRESENT-WEATHER  TYPES 
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TABLE  m 

ABRIDGED  DESCRIPTION  OF  PAST -WEATHER  AND 
LOW-,  MIDDLE-,  AND  HIGH-CLOUD  TYPES 


cL 

DESCRIPTION 

■Abridged  From  W  M  0  Code 

r 

's-'M 

DESCRIPTION 

(Abridged  From  W  M.O  Code) 

1 

Cu  of  fair  weather,  little  vertical 
development  and  seemingly  flattened 

Thin  As  (most  of  cloud  layer  semi¬ 
transparent) 

Thick  As,  greater  part  sufficiently 
dense  to  hide  sun  (or  moon),  or  Ns 

Thin  Ac,  mostly  semi-transparent; 
cloud  elements  not  changing  much  and  at 
a  single  level 

Thin  Ac  In  patchesj.cloud  elements 
continually  changing  and/or  occurring 
at  more  than  one  level 

Thin  Ac  In  bands  or  In  a  layer  gradually 
spreading  over  sky  and  usually  thicken¬ 
ing  as  a  whole 

Ac  formed  by  the  spreading  out  of  Cu 

Double-layered  Ac,  or  a  thick  layer  of 
Ac,  not  increasing,  or  Ac  with  As  and/or 
Ns 

Ac  In  the  form  of  Cu-shaped  tufts  or  Ac 
with  turrets 

Ac  of  a  chaotic  sky,  usually  at  different 
levels;  patches  of  dense  Cl  are  usually 
present  also 

CM 

a 

Cu  of  considerable  development,  gener-  q 

ally  towering,  with  or  without  other  Cu 
or  Sc  bases  all  at  same  level 

3 

a 

Cb  with  tops  lacking  clear-cut  outlines,  —t 

but  d:stinclly  not  clrrlform  or  anvil- 
shaped,  with  or  without  Cu,  Sc,  or  St 

4 

-o- 

Sc  formed  by  s  p  r  e  ad  i  ng  ou  t  of  Cu;  Cu  4 

often  present  also 

a 

5 

— 

Sc  not  formed  by  spreading  out  of  Cu  ^ 

a 

b 

St  or  Fs  or  both,  but  no  Fs  of  bad  weather  ^ 

a 

7 

— 

Fs  and/or  Fc  of  bad  weather  (scud) 

GO 

CU  and  Sc  (not  formed  by  spreading  out  Q 

of  Cu)  with  bases  at  different  levels  Q 

9 

a 

Cb  having  a  clearly  fibrous  (clrrlform)  /~n 

top,  often  anvil-shaped,  with  or  without  y 

Cu,  Sc,  St,  or  scud 

c 

'H 

1 

2 

_ V 

3 

— ? 

4 

5 

^ 

6 

7 

3 

9 

<3 

DESCRIPTION 

(Abridged  From  W  M  0  Code) 


FI  laments  of  Cl,  or  “mares  tails, ’’ 
scattered  and  not  Increasing 

Dense  Cl  In  patches  or  twisted  sheaves, 
usually  not  Increasing,  sometimes  like 
remains  of  Cb;  or  towers  or  tufts 

Dense  Cl,  often  anvil-shaped,  derived 
from  or  associated  with  Cb 

Cl,  often  hook-shaped,  gradually  spread¬ 
ing  over  the  sky  and  usually  thickening 
as  a  whole 

Cl  and  Cs,  often  In  converging  bands,  or 
Cs  alone;  ge  n  e  r  a  1 1  y  overspreading  and 
growing  denser;  the  continuous  layer  not 
reaching  45°  altitude 

Cl  and  Cs,  often  In  converging  bands,  or 
Cs  alone,  generally  oversp  reading  and 
growing  denser;  the  continuous  layer 
exceeding  45°  altitude 

Veil  of  Cs  covering  the  entire  sky 


Cs  not  increasing  and  not  covering 
entire  sky 


Cc  alone  or  Cc  with  some  Cl  or  Cs,  but 
the  Cc  being  the  main  clrrlform  cloud 


CLOUD 

ABBREVIATION 

St  or  Fs-Stratus 
or  Fracloslratus 

Cl-Cirrus 

Cs-Cirrostratus 

Cc-Cirrocumulus 
Ac-Altocumulus 
As- Altostratus 

Sc-Stratocumulus 

Ns-Nimbostratus 

Cu  or  Fc- Cumulus 
or  Fractocumulus 

Cb-Cumulonimbus 


Code 

Number 

w 

0 

1 

2 

3 

X 

4 

5 

J 

6 

• 

7 

★ 

00 

t> 

9 

K 

PAST 

WEATHER 


Not 

►  Plot¬ 
ted 


Clear  or  few 
clouds 

Partly  cloudy 
(scattered)  or 
variable  sky 

Cloudy  (broksn) 
or  ovsrcast  4 


Sandstorm  or  dust» 
storm ,  or  drifting  or 
blowing  snow 

Fog,  or  smoke,  or 
thick  dust  hazs 


Snow,  or  rain  and 
snow  mixed,  or  Ice 
pellets  (sleet) 

Shower(s) 

Thunderstorm,  with  or 
without  preclpltaH'"* 
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(a)  the  DPS  climatology  at  the  850-,  700-,  and  500-mb  levels  during 
the  warm  season  (see  Fig.  4), 

(b)  the  likelihood  of  developing  useful  decision  trees  by  considering 
each  variable  separately,  and 

(c)  limitations  imposed  by  the  frequency  of  occurrence  of  particular 
surface  variables  in  the  warm  season. 

This  will  be  referred  to  later. 

3.  850-mb  Decision  Tree 

For  the  850 -mb  decision  tree,  variables  were  accepted  at  the  first  branch  of 
the  tree  if  60% or  more  of  the  cases  fell  within  five  consecutive,  1°  C  DPS  intervals; 
and  at  the  other  branches  if  55% or  more  of  the  cases  fell  within  5  consecutive, 

1°  C  DPS  intervals. 

Analysis  of  the  full-sample  climatology  and  individual  histograms  for  the 
850-mb  level  suggested  that  the  variables  low-cloud  type  (C  ),  low -cloud  amount 

L 

(N^),  present  weather  (ww),  and  past  weather  (W)  contain  many  mutually-exelusive 
types  well  related  to  the  850-mb  humidity.  Thus  a  decision  tree  was  developed  in 
which  each  surface  variable  was  considered  separately.  Further  (based  on  the 
850-mb  climatology  of  warm-season  observations),  it  was  obvious  that  the  diagnos¬ 
tic  relations  developed  for  850  mb  would  be  restricted  to  those  that  implied  moist 
conditions,  because  (a)  low-level  (850-mb)  dryness  occurs  infrequently  and  (b)  no 
clear-cut  association  of  low-level  dryness  to  specific  types  of  surface  variables  has 
been  found. 

To  find  the  sequence  for  utilizing  the  selected  variables  that  would  lead  to 
optimum  results,  many  alternatives  were  screened.  The  following  sequences,  deter¬ 
mined  by  isolating  the  acceptable  relationships,  were  considered: 

(a)  low-cloud  type,  low-cloud  amount,  present  weather,  past  weather, 

(b)  low-cloud  type,  low-cloud  amount,  past  weather,  middle-cloud 

type, 

(c)  low-cloud  type,  low-cloud  amount,  low-cloud  height,  past  weather, 

(d)  low-cloud  type,  past  weather,  present  weather, 

(e)  past  weather,  low-cloud  type,  present  weather. 


13 


saouaajnooo  jo  aaquin^ 


saouajanDDO  jo  Jaqum^ 


ifS 


saouajjriDDO  jo  aoquin^ 


£ 

Q 


14 


Fig.  4.  Climatology  of  850-,  700-  and  500-mb  dew-point  spread. 


Table  IV  summarizes  the  number  of  eases  of  eaeh  variable  aeeepted  (by  the 
criterion  described  above)  in  eaeh  of  the  five  sequences.  Sequence  (a)  was  ehosen 
for  further  development,  even  though  sequence  (e)  yielded  many  more  aeeeptable 
diagnoses  than  any  other.  The  reasons  for  this  ehoiee  follow.  First,  a  large 
percentage  (over  90%)  of  the  eases  in  the  3rd  and  4th  branches  of  sequence  (e)  barely 
met  the  aeeeptable  requirements,  while  nearly  all  of  the  eases  in  the  same  branches 
of  sequence  (a)  were  well  above  the  minimum  aeeeptable  level.  Seeond,  if  there  was 
indeed  reliable  and  usable  information  in  the  eases  included  in  sequence  (e) ,  but  not 
in  sequence  (a) ,  then  that  information  would  be  gleaned  from  the  residual  data 
sample  by  the  REEP  technique.  Finally,  the  order  in  whieh  the  surfaee  variables 
are  considered  in  sequence  (a)  refleets  the  general  characteristics  of  warm- season 
weather  —  widespread  areas  of  precipitation  are  less  prevalent  and  eonveetive- 
type  eloudiness  is  more  prevalent  than  in  winter  —  thus  foreing  the  researeher  or 
the  synoptieian  to  rely  more  heavily  on  eloudiness  for  indirect  estimates  of  upper- 
air  humidity.  Thus,  sequence  (a)  was  ehosen  and  is  diseussed  in  detail  below. 


TABLE  IV 

NUMBER  OF  ACCEPTABLE  CASES 
(850  mb) 


Sequence 

Order  of  selection 

Total 

1 

2 

3 

4 

(a) 

2416(C) 

2746  (N) 
n 

264(ww) 

655  (VO 

6081 

(b) 

2416(Ct  ) 

L 

274  6  (Nh) 

75  6  (VO 

200  (CM) 

6118 

(e) 

2416(0  ) 

Lj 

2746(N  ) 
h 

3018(h) 

251  (VO 

8431 

(d) 

2  416(C  ) 

145  9  (VO 

302  (ww) 

— 

4177 

(e) 

288  6(W) 

2970  (CL) 

250  (ww) 

— 

6106 

Figure  5  shows  the  reeommended  850-mb  decision  tree.  The  dew-point  spread 
(DPS)  value  in  eaeh  subdivision  of  the  decision  tree  is  the  midpoint  value  of  the  five 
eonseeutive  1°C  DPS  intervals  containing  the  greatest  number  of  eases  for  eaeh 
selected  variable. 
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(3) 


Fig.  5.  850-mb  decision  tree  for  warm  season. 
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Table  V  gives  more  information  about  the  selected  variables  individually, 
sueh  as  the  modal  DPS  interval,  the  midpoint  value  and  percentage  of  the  five 
consecutive  intervals,  and  the  number  of  cases  in  the  developmental  sample  in 
which  the  particular  variable  was  observed.  Figures  6  through  9  are  individual 
histograms  of  the  selected  variables. 

TABLE  V 

SEQUENCE  (a)  SELECTED  SURFACE  VARIABLES 

(850  mb) 


Variable 

Type 

Modal 

interval 

DPS 

(°C) 

Five  interval 

Number  of 
diagnoses 

Midpoint 

DPS 

(°C) 

Percent 

CL 

3 

3°  C 

3°  C 

GO 

358 

7 

1°C 

2°  C 

83 

655 

8 

2°  C 

3°  C 

G9 

520 

9 

2°  C 

3°  C 

61 

798 

Nh 

5-7 

2°  C 

3°  C 

56 

1769 

8 

1°C 

2°  C 

55 

715 

9 

1°C 

2°  C 

57 

185 

\v\v 

15 

4°  C 

4°  C 

59 

40 

21 

4°  C 

3°  C 

68 

39 

25 

2°  C 

3°  C 

76 

85 

GO-61 

1  and  2°  C 

3°  C 

79 

69 

80 

5°  C 

3°  C 

64 

25 

w 

G 

4°  C 

4°  C 

63 

169 

7 

3°  C 

3°  C 

66 

50 

8 

4°  C 

4°  C 

67 

363 

9 

3  and  7°C 

5°  C 

53 

57 

17 
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Fig.  6.  Distribution  of  850-mb  dew-point  spread  for  selected  low-cloud  types. 
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Fig.  7.  Distribution  of  850-mb  dew-point  spread  for  selected  low-cloud  amount  types. 
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Fig.  8.  Distribution  of  850-mb  dew-point  spread  for  selected  present -weather  types. 
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Fig.  9.  Distribution  of  850-mb  dew-point  spread  for  selected  past-weather  types. 


The  first  variable  in  sequence  (a)  is  low -cloud  type.  Of  the  several  types  of 
low  cloud,  four  yielded  acceptable  estimates  of  the  850-mb  DPS.  These  are:  con¬ 
vective  cloudiness  of  considerable  vertical  extent  (types  3  and  9);  multi-layered 
cumulo-form  cloudiness  (type  8);  and  low  cloudiness,  generally  associated  with 
extra -tropical  cyclones  (type  7).  Type  7  implies  the  most  nearly  saturated  condi¬ 
tions  at  850  mb.  When  the  cases  containing  these  low  cloud  types  (types  3,  7,  8,  9) 
are  eliminated  from  the  data  sample,  the  next  variable  (low  cloud  amount)  is 
considered. 

Of  the  five  categories  of  low -cloud  amount  (clear,  scattered,  broken,  over¬ 
cast  and  obscured),  three  yielded  acceptable  relationships  to  the  850-mb  humidity. 
They  were:  broken  (5—7/ 8),  overcast  (8/ 8),  and  obscured  (Code  9).  [Recall  that 
these  amounts  of  low  cloudiness  are  associated  only  with  stratiform  clouds  (types 
5  and  6)  or  cumulus  clouds  of  little  vertical  extent  (types  1,  2,  and  4)^  The  data 
sample  was  then  further  reduced  by  eliminating  these  cases. 

The  third  variable  considered  was  present  weather.  The  vast  majority  of 
weather  events  (precipitation)  occurring  in  those  cases  still  remaining  in  the  data 
sample  would  result  from  middle  clouds  not  associated  with  wide-spread  low  cloudi¬ 
ness.  While  the  five  interval  percents  are  quite  high  for  the  selected  present- 
weather  types,  the  number  of  cases  of  each  (see  last  two  columns,  Table  V)  is 
rather  small  because  widespread  precipitation  occurs  less  frequently  during  the 
warm  months. 

Finally,  additional  useful  estimates  were  gleaned  from  certain  past  weather 
types  after  the  acceptable  present  weather  types  had  been  removed  from  the  sample. 

After  the  acceptable  past  weather  cases  were  removed  from  the  sample,  the 
remaining  cases  constituted  the  "residual  sample"  from  which  statistical  estimates 
of  850-mb  humidity  were  obtained  using  REEP.  This  is  described  in  subsection  6. 

Comparison  of  the  decision  tree  derived  from  warm  season  data  with  the  one 
derived  from  cold  season  data  [2]  reveals  differences  and  similarities .  One  basic 
difference  is  the  order  in  which  the  variables  are  considered.  During  the  cold 
season,  when  wide-spread  areas  of  precipitation  are  more  common,  the  present 
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weather  variable  is  considered  first;  during  the  warm  season  low  cloud  type  is 
considered  first.  Another  difference  is  that  the  variety  of  present  weather  types 
that  yield  good  estimates  for  the  cold  season  far  outnumber  those  that  yield  good 
estimates  for  the  warm  season.  This  is  mainly  because  these  present-weather 
types  occur  in  the  summer  months  too  infrequently  to  yield  stable  relationships 
(even  when  all  such  events  are  grouped  together  based  on  sound  synoptic  reason¬ 
ing).  The  warm-  and  cold-season  850-mb  decision  trees  have  these  points  in 
common:  they  use  the  same  variables  (but  in  different  order) ,  and  they  provide 
relationships  yielding  only  moist  diagnoses. 

4 . _  700-mb  Decision  Tree 

For  the  700-mb  decision  tree,  variables  were  accepted  at  the  first  branch 
of  the  tree  if  50%  or  more  of  the  cases  fell  within  five  consecutive  1°C  DPS  inter¬ 
vals;  and  at  the  other  branches  if  45%  or  more  of  the  cases  fell  within  five  consecu¬ 
tive,  1°C  DPS  intervals. 

Analysis  of  the  full -sample  climatology  and  individual  histograms  for  the 
700-mb  level  justified  the  development  of  a  decision  tree  considering  each  surface 
variable  separately.  The  DPS  climatology  for  the  700-mb  level,  rather  similar 
to  that  for  850-mb,  showed  potential  for  diagnostic  relationships  indicative  of 
moist  conditions  only. 

Four  sequences  for  utilizing  surface-observed  variables  were  considered  at 
700-mb.  They  were: 

(a)  total -cloud  amount,  low-cloud  type,  middle-cloud  type,  present 

weather,  past  weather, 

(b)  total -cloud  amount,  past  weather,  low -cloud  type,  present  weather, 

(c)  present  weather ,  low -cloud  type,  middle -cloud  type,  past  weather, 

(d)  present  weather ,  middle -cloud  type,  low-cloud  type,  past  weather. 

Table  VI  gives  the  number  of  cases  containing  acceptable  relationships  in  each 

of  the  four  sequences.  Sequence  (a)  was  selected  for  the  700-mb  decision  tree  lie- 
cause  (a)  it  yielded  more  acceptable  diagnoses  and  (b)  the  selected  variables 
yield  slightly  more  reliable  estimates,  for  the  most  part,  in  sequence  (a).  The 
recommended  700-mb  decision  tree  is  discussed  in  detail  below. 
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TABLE  VI 

NUMBER  OF  ACCEPTABLE  CASES  (700-mb) 


Sequence 

Order  of  selection 

Total 

1 

2 

3 

4 

5 

(a) 

2895(Nt) 

689(C  ) 

Xj 

303(Cm) 

428(ww) 

378(W) 

4963 

(b) 

2895(Nt) 

616(W) 

553(C  ) 

Lj 

325(ww) 

— 

4389 

(c) 

1905(ww) 

918(Cl) 

179<cm> 

510(W) 

— 

3512 

(d) 

1905(ww) 

569(Cm) 

773(C  ) 

Lj 

495(W) 

— 

3702 

Figure  10  is  the  recommended  700-mb  decision  tree.  As  in  Fig.  5,  the 
DPS  value  in  each  subdivision  is  the  midpoint  value.  Table  VII  gives  more  informa¬ 
tion  about  the  selected  variables  individually,  such  as  the  modal  DPS  interval,  the 
midpoint  value  and  percentage  of  accepted  cases,  and  the  number  of  cases  in  the 
developmental  sample  in  which  the  particular  variable  was  observed.  Figures  11 
through  14  are  individual  histograms  of  the  selected  variables. 


TABLE  VII 

SEQUENCE  (a)  SELECTED  SURFACE  VARIABLES 

(700-mb) 


Variable 

Type 

Modal 

DPS 

Midpoint 

DPS 

Five  interval 
percent 

Diagnoses 

Nt 

8 

1°C 

2°C 

56 

2801 

cl 

7 

2°C 

3°C 

52 

78 

9 

3°C 

5°C 

49 

590 

Cm 

1 

2°C 

4°C 

46 

50 

6 

5°C 

5°C 

45 

242 

WW 

13 

7°C 

6°C 

59 

26 

14 

3°C 

5°C 

44 

27 

15 

5°C 

5°C 

48 

54 

21 

2°C 

2°C 

77 

37 

25 

4°C 

3°C 

45 

139 

51 

4°C 

2°C 

47 

26 

60 

1°C 

2°C 

83 

22 

61 

2°C 

2°C 

54 

21 

71 

2°C 

3°C 

80 

19 

80 

5°C 

3°C 

52 

45 

w 

6 

3°C 

3°C 

56 

237 

7 

4°C 

2°C 

44 

62 

9 

6°C 

4°C 

53 

68 

25 


No 


Yes 


CL=7 

DPS  =  3°C 

CL=9 

DPS  =  5°C 

Statistical 

Approach 


Yes 


W=  7 

DPS  =  3°C 

W=  6 

DPS  =  3°C 

n 

CD 

DPS  =  4°C 

Fig.  10.  700-mb  decision  tree  for  warm  season. 
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Fig.  11.  Distribution  of  700-mb  dew-point  spread  for  selected  total -cloud  amount  and  low-cloud  types. 
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Fig.  12.  Distribution  of  700-mb  dew-point  spread  for  selected  middle-cloud  types. 
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Fig.  13.  Distribution  of  700-mb  dew-point  spread  for  selected  present-weather  types. 
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Fig.  13.  Concluded 
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Fig.  14.  Distribution  of  700-mb  dew-point  spread  for  selected  past-weather  types. 


At  700  mb,  the  first  variable  considered  is  total  cloud  amount.  The  over¬ 
cast  category  of  total  cloud  amount  yielded  an  acceptable  relation  while  the  other 
categories  did  not.  Although  well-defined  weather  systems  occur  less  frequently 
during  the  warm  season,  cloudiness  resulting  in  overcast  conditions  is  closely  re¬ 
lated  to  moist  conditions  at  mid-tropospheric  levels. 

The  second  variable  considered  is  the  low  cloud  type.  Two  of  the  nine  cloud 
types  yielded  acceptable  relationships  to  the  700-mb  DPS.  They  were  type  7  (scud 
clouds  associated  with  the  extratropical  cyclones)  and  type  9  (anvil -shaped  cumu¬ 
lonimbus).  Note  that  type  3  (cumulonimbus,  not  anvil-shaped)  had  teen  selected 
at  830  mb  but  was  unacceptable  at  700  mb. 

The  third  variable  considered  is  middle  cloud  type.  Here,  again,  two  of  the 
nine  types  yielded  adequate  relations  to  the  700-mb  DPS.  They  were  type  1  (thin 
altostratus)  and  type  6  (low  altocumulus  formed  by  spreading  out  of  cumulus). 

The  fourth  variable  considered  is  present  w'eather.  The  selected  types 
varied  from  light  precipitation  (for  which  a  diagnosed  value  of  2°C  is  recommended) 
to  observed  lightning  (with  a  diagnosed  value  of  6°C). 

Finally,  three  types  of  past  weather  account  for  the  last  branch  of  the  700-mb 
decision  tree.  They  were  type  6  (rain),  type  7  (snowr),  and  type  9  (thunderstorms). 

The  part  of  the  developmental  sample  remaining  after  all  cases  containing 
acceptable  surface  -observed  variables  had  been  removed  was  called  the  "residual 
sample". 

Statistical  estimates  of  the  700-mb  DPS  were  derived  from  this  residual 
sample  using  the  REEP  technique.  The  results  are  discussed  in  subsection  7. 

As  was  the  case  at  830  mb,  there  are  similarities  and  differences  between 
the  700-mb  decision  trees  developed  for  the  warm  and  cold  seasons.  Four  vari¬ 
ables  (ww,Cj  ,W,C^)  constitute  the  cold-season  decision  tree  while  the  warm- 
season  decision  tree  comprises  these  variables  plus  The  order  in  which  the 

variables  are  considered  reflects  the  basic  difference  between  summer  and  winter 
w'eather  as  was  explained  in  the  discussion  concerning  830  mb.  The  recommended 
values  of  DPS  in  the  cold  season  were  either  2°C  or  3°C  w'hile  in  the  warm  season 
they  vary  from  2°C  to  6°C. 
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5.  500-mb  Decision  Tree 

It  had  been  found  [2]  in  the  development  of  the  500-mb  cold-season  decision 
tree  that  usable  results  could  be  achieved  only  by  considering  variables  jointly 
(rather  than  individually  as  had  been  done  at  the  lower  levels,  850  and  700  mb). 
Preliminary  examination  of  the  full  sample  climatology  and  individual  histograms 
indicated  a  similar  approach  would  be  necessary  for  the  development  of  the  warm- 
season  500-mb  decision  tree.  Further,  it  wras  apparent  from  the  climatology  that 
estimates  of  both  moist  and  dry  conditions  could  be  realized  at  500  mb. 

Thus  it  was  necessary  to  establish  threshold  values  for  moist  conditions  and 
for  dry  conditions.  For  moist  conditions  the  value  was  45%  or  more  cases  within 
five  consecutive  1°C  DPS  intervals,  and  for  dry  conditions,  60%  or  more  cases  in 
which  the  DPS  was  greater  than  or  equal  to  15°C.  Note  that  the  500-mb  climatology 
(Fig.  4)  has  a  primary  peak  at  17°C  due  to  the  use  of  statistical  values  of  DPS 
which  varies  from  27°C  at  a  temperature  of  20°C,  to  10°C  at  a  temperature  of  -40°C. 
Therefore,  the  climatology  of  DPS  at  500  mb  has  been  altered  significantly,  in 
this  case,  from  about  15°C  and  drier. 

There  were  several  logical  ways  to  consider  the  surface  variables  jointly  to 
form  subsamples  that  would  separate  the  moist  cases  from  the  dry  cases.  Consid¬ 
erable  experimentation  with  the  cold-season  data  gave  us  combinations  that  yielded 
the  best  basis  for  stratification  of  the  moist  and  dry  cases  [2].  Preliminary  appraisal 
of  the  warm -season  histograms  indicated  that  similar  combinations  were  appro¬ 
priate. 

Figure  15  is  the  recommended  500-mb  decision  tree.  The  left  side  of  the 
figure  shows  the  combinations  of  surface -observed  variables  used  to  stratify  the 
data  sample  and  the  right  side  shows  the  variables  which  yielded  satisfactory  500-mb 
DPS  estimates.  Table  VIII  gives  more  information  about  the  selected  variables  and 
Figs.  16  through  18  are  individual  histograms  of  the  selected  variables. 

To  isolate  cases  representing  moist  conditions  at  500  mb  the  occurrence  of 
lowr -cloud  type  7  or  middle-cloud  type  1,  2,  or  7  was  considered.  In  tills  subsample, 
a  satisfactory  association  was  realized  between  the  total  cloud  amount  being  over¬ 
cast  and  the  500-mb  DPS. 
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Fig.  15.  500-mb  decision  tree  for  warm  season. 
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Number  of  occurrences 


Fig.  16.  Distribution  of  500-mb  dew-point  spread  for  selected 
total-cloud  amount  types  (moist  subsample). 
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Fig.  17.  Distribution  of  500-mb  dew-point  spread  for  selected  past- weather  types  (dry  subsample). 
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Fig.  18.  Distribution  of  500-mb  dew-point  spread  for  selected  past- weather  types  (marginal-moist  subsample). 


TABLE  VIII 

SELECTED  SURFACE  VARIABLES 
(500-mb) 


Variable 

Type 

Modal 

DPS 

Midpoint 

DPS 

Five  interval 
pereent 

Diagnoses 

nt 

<^;7orCM=1'2’ 

[moist] 

8 

3°C 

3°C 

56 

1557 

W 

(N  -0-4  and  C  =0) 

[dry]  M 

0 

>15°C 

>15°C 

63* 

3709 

4 

>15°C 

>15°C 

63* 

187 

W 

6 

3°C 

3°C 

46 

341 

[marginal  moist] 

7 

3°C 

4°C 

50 

85 

9 

2°C 

4°C 

49 

188 

TOTAL  6067 

*Pereentage  of  eases  where  the  500-mb  dew-point  spread  is  greater  than  or 
equal  to  15°C. 


To  isolate  eases  representing  dry  conditions  at  500  mb  the  variables  total 
eloud  amount  and  middle-cloud  type  were  considered  jointly.  In  particular,  the 
dry  subsample  was  limited  to  those  eases  in  wliieh  the  total  eloud  amount  was 
clear  or  scattered  (N  -  0—4)  and  there  was  not  a  reported  middle-cloud  type 
(C  -  0).  Within  this  subsample,  two  types  of  past  weather  (past  G  hr)  yielded 
acceptable  estimates  of  500-mb  DPS.  They  were  type  0  (elear  or  few  clouds)  and 
type  4  (fog),  both  generally  associated  with  widespread  subsidence  at  levels  below 
500  mb. 

The  remaining  subsample  eould  not  be  satisfactorily  stratified  into  marginal 
moist  and  marginal  dry  subsamples  as  had  been  done  with  the  eold-season  data. 
However,  estimates  of  moist  conditions  wrere  possible  based  on  past  weather  from 
this  subsample. 

The  three  types  of  past  weather  selected  were  type  6  (rain) ,  type  7  (snow') , 
and  type  9  (thunderstorms) . 

Having  exhausted  useful  decision-tree  diagnostic  estimates  from  the  500-mb 
developmental  sample,  the  remaining  sample  (residual  sample)  was  set  aside  for 
the  statistical  evaluation,  discussed  in  subsection  8. 
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The  basic  difference  between  the  warm-season  and  cold-season  decision 
trees  developed  for  500  mb  is  that  the  cold  season  was  stratified  into  four  sub¬ 
samples  and  the  warm  season  was  stratified  into  three.  Other  differences  con¬ 
cern  the  variable  yielding  an  acceptable  diagnosis  within  the  dry  subsample.  In 
the  warm-season  decision  tree  described  herein,  past  weather  types  0  and  4  were 
used,  while  low-cloud  type  was  used  in  the  cold-season.  Characteristics  common 
to  both  are  (a)  the  consideration  of  more  than  one  variable  at  a  time  in  developing 
the  decision  tree,  and  (b)  the  identical  variables  used  to  stratify  the  moist  and 
dry  subsamples  for  both  seasons. 
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SECTION  IV 


STATISTICAL  TECHNIQUE 


The  technique  selected  for  the  development  of  the  diagonostie  relations  was 
the  Regression  Estimation  of  Event  Probabilities  (REEP)  developed  by  Miller  [lo]. 

It  was  ehosen  beeause  (a)  it  provides  probability  estimates  of  the  several  categories 
of  the  specifieand  (dew-point  spread)  and  (b)  the  diagnostic  relations  developed  by 
REEP  can  be  used  efficiently  within  an  operational  computer  system  where  timing 
and  storage  usage  considerations  may  be  eritieal. 

The  procedure  for  the  selection  of  specifiers  in  REEP  is  similar  to  that  used 

4 

in  screening  multiple  discriminant  analysis  (MDA)  .  MDA  is  a  technique  for 
selecting  a  minimum  number  of  specifiers  (from  a  large  possible  set)  that  most 
efficiently  achieves  discrimination  among  the  groups  of  the  speeifieand.  For  example, 
the  speeifieand  of  dew-point  spread  had  three  groups.  To  seleet  a  minimum  number 
of  effective  specifiers  from  a  large  set,  a  criterion,  \  ,  is  used  to  separate  the 
effective  from  the  ineffective  specifiers.  This  criterion  maximizes  the  distances 
between  the  mean  values  of  the  speeifieand  groups  and  minimizes  the  spread  of 
points  about  each  of  the  group  means.  The  criterion  can  be  represented  as  a  single 
number  by  computing  the  following  ratio: 

_  (Measure  of  distance  between  group  means) 

(Measure  of  spread  of  points  about  eaeh  group  mean) 


The  X~eriterion  is  used  as  follows.  Let  there  be  P  possible  specifiers.  The 
first  step  is  to  eompute  P  values  of  X  >  based  on  eaeh  specifier.  The  first  specifier 
selected  is  the  one  that  gives  the  largest  X.  P-1  values  of  X  are  computed  using 
two  specifiers,  one  of  whieh  is  the  first  specifier  selected  and  the  other,  one  of  the 
remaining  P-1  specifiers.  The  one  giving  the  maximum  value  of  X  is  selected  as 
the  seeond  specifier.  Third  and  higher  specifiers  are  selected  by  computing  X 
using  three  and  more  specifiers.  The  procedure  is  continued  until  a  statistical 
test  indicates  that  the  last  specifier  selected  does  not  contribute  significant 
discriminating  information. 

^A  complete  description  of  MDA  is  given  by  Miller  [9].  The  brief  explanation 
given  here  is  adapted  from  Enger  [8]. 
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Given  a  set  of  specifiers  (independent  variables),  P  ,  .  .  .  ,  P  ,  the  problem 

1  r 

then  consists  of  estimating  the  probability  distribution  over  a  set  of  G  mutually 

exclusive  and  exhaustive  groups  defined  for  the  specificand.  A  series  of  multiple 

regressions  are  preformed  on  G  zero-one  dependent  variables,  Y  ,  .  . .  ,  Y  , 

1  G 

where  each  dependent  variable  is  associated  with  one  of  the  G  specificand  groups. 
The  independent  variables,  P  ,  .  .  .  ,  P^,  are  identical  in  each  of  the  G  regressions. 
From  the  series  of  regressions,  a  least-square  estimate  of  the  A's  in  the  following 
set  of  equations  is  obtained. 


E<Y1X>  -  Als 


p 

s 


E<Y2X>  -  A2SPS 


(IV- 2) 


E<YGX)  '  AGSPs 


<p„  *  1) 


All  of  the  conditional  distributions  are  Bernoulli  (zero-one)  distributions.  For 
a  single  trial  the  expectation  is  equal  to  the  probability  that  Y  =  1.  Therefore,  the 
regression  functions  yield  least-square  estimates  of  the  group  probabilities.  These 
extimates  contain  both  desirable  and  undesirable  features.  The  desirable  properties 
are  that  the  estimates  add  up  to  unity  and  that  the  estimates  essentially  minimize 
the  Brier- Allen  P  score.  An  undesirable  property  is  that  the  estimates  arc  not 
bounded  by  0  and  1.  To  overcome  this  problem,  the  estimates  are  renormalized 
by  (a)  setting  all  negative  estimates  equal  to  zero,  (b)  setting  all  estimates  greater 
than  one  equal  to  one  and  (c)  dividing  each  estimate  by  the  overall  sum. 

The  REEP  procedure  for  selection  of  specifiers  provides  information, 
regarding  the  reduction  of  variance  of  the  specificand,  that  is  not  available  from 
most  other  statistical  techniques.  The  reduction  is  stratified  by  the  contribution 
to  each  category  of  the  specificand.  That  is,  when  a  selected  specifier  discriminates 
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effectively  between  one  category  and  the  others,  the  reduction  of  the  variance  of 
that  particular  category  is  greater  than  that  of  the  others.  Similarly,  other  spec¬ 
ifiers  more  effectively  reduce  the  variance  of  other  specificand  categories. 

The  REEP  technique  was  applied  to  each  of  the  residual  samples  independently. 
The  specificand  (dew-point  spread)  was  separated  into  3  categories  at  each  of  the  3 
levels.  The  range  of  values  for  the  3  categories  and  a  value  representative  of  the 
range  are  listed  in  Table  IX. 


TABLE  IX 

DEW-POINT  SPREAD  CATEGORIES  FOR  REEP  EXPERIMENTS 


Level  (mb) 

Category 

Range  of  values  (°  C) 

Representative  value  (°  C) 

850 

i 

0  < DPS  <  6 

3 

2 

6  <  DPS  13 

10 

3 

13  <  DPS 

18 

700 

1 

0  <  DPS  -s  6 

3 

2 

6  <  DPS  <;  14 

11 

3 

14  <  DPS 

19 

500 

1 

0  <  DPS  s  8 

4 

2 

8  <  DPS  <  15 

12 

3 

15  <  DPS 

20 

6.  850-mb  Residual  Sample 


The  850-mb  REEP  equations  were  developed  from  a  residual  sample  consisting 
of  5400  cases  in  the  dependent  sample  and  1386  cases  in  the  independent  sample. 
Approximately  20%  of  the  residual  sample  was  set  aside  as  independent  data. 

The  surface  DPS  categories  describing  dry  and  moist  conditions  were  selected 
first  and  third  as  specifiers  of  the  850-mb  DPS.  These  and  the  other  surface  vari¬ 
ables  selected  as  significant  specifiers  of  850-mb  DPS  are  listed  in  Table  X.  Included 
are  the  category  (dummy  variable)  of  the  surface  variable  which  was  selected  and 
the  coefficients  associated  with  each  selected  specifier  in  the  REEP  equations  for 
the  3  categories  of  the  specificand.  A  positive  coefficient  increases  the  probability 
of  the  given  category  occurring  while  a  negative  coefficient  decreases  the  probability. 
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TABLE  X 

850-mb  SELECTED  VARIABLES  AND  ASSOCIATED  COEFFICIENTS 


Order 

Selected 

variables 

Range  of  values 

Coefficients  of  REEP 

equations 

Cat.  1 

Cat.  2 

Cat.  3 

1 

DPS 

15  <  DPS 

-.  095 

-.  249 

.344 

2 

h 

9 

-.  20 G 

.  048 

.  159 

3 

DPS 

0  ^  DPS  <  5 

.  29G 

-.  123 

-.  173 

4 

CM 

0 

-.  096 

.  030 

.  066 

5 

T 

20  <  T  s  30 

.  081 

-.  020 

-.  061 

G 

DPS 

5  <  DPS  <£  12 

.  130 

-.  043 

-.  087 

7 

ww 

0  •£  WW  £  1 

.082 

-.  016 

-.  066 

8 

nt 

5  5  NT  <  7 

.  076 

-.  039 

-.  037 

9 

Td 

10  <  T  £  20 
d 

.  058 

-.  031 

-.  027 

10 

T 

TiO 

.  085 

.  087 

-.  172 

11 

Td 

-10  <T  <0 
d 

-.  030 

-.  071 

.  101 

12 

CL 

G 

-.  133 

-.  053 

.  18G 

Additive  Constants 

.  283 

.431 

.  28G 

Note  that  the  first  and  third  specifiers  selected  make  the  largest  positive  contributions 
to  the  probability  of  occurrence  of  the  specif icand  categories  3  and  1  respectively. 

Of  the  several  other  specifiers  selected,  noteworthy  additional  contributions 
to  the  categorical  probabilities  were  made  by  the  2nd,  Gth,  10th,  and  12th  specifiers. 
The  2nd  specifier  indicates  by  its  occurrence  (cloud  base  >  8000  ft)  a  lower  proba¬ 
bility  of  Category  1  (DPS  sG)  and  a  higher  probability  of  Category  3  (DPS  >13). 

The  increased  likelihood  of  moist  conditions  at  850  mb,  when  the  surface  dew¬ 
point  spread  falls  in  the  range  5  <DPS  sl2  (Gth  selected  specifier),  can  be 
explained  by  adiabatic  cooling  and  increased  humidity  with  height  typically  present 
under  a  warm-season  inversion.  Thus,  a  surface  DPS  between  5  and  12°  C 
would  be  associated  with  an  850-mb  DPS  most  frequently  in  the  range 
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described  by  Category  1  of  the  specificand.  The  chaiaeteristies  of  the  10th 
specifier  (T<-  0a  C)  could  be  attributed  to  small  polar  outbreaks  in  which  consider¬ 
able  overrunning  of  the  shallow  air  mass  results  in  at  least  somewhat  moist  condi¬ 
tions  at  850  mb.  Finally,  the  high  probability  of  dryness  and  low  probability  of 
moistness  at  850  mb,  where  low-cloud  type  6  (stratus)  is  present  (12th  selected 
specifier),  is  related  to  the  lower  base  of  the  subsidence  inversion  generally 
associated  with  stratus. 

Testing  of  the  850- mb  REEP  equations  to  diagnose  3  categories  of  850- mb 
DPS  was  conducted  with  both  the  dependent  and  independent  data  samples  (5400  and 
1386  cases,  respectively).  The  results  were  evaluated  with  contingency  tables 
and  are  shown  in  Table  XI. 

TABLE  XI 

(Specification  of  3  categories  of  DPS)  850-mb  RESIDUAL  SAMPLE 
(a)  Dependent-data  specification  of  dew-point  spread 


Observed 

Total 

Specified 

1 

2 

3 

Specified 

1254 

700 

342 

2296 

2 

533 

910 

708 

2151 

3 

53 

244 

656 

953 

Total  observed 

1840 

1854 

1706 

5400 

Number  of  hits  2820 

Percent  correct  52.  2 

(b)  Independent-data  specification  of  dew-point  spread 


Observed 

Total 

Specified 

1 

2 

3 

Specified 

1 

296 

196 

90 

582 

2 

140 

258 

209 

607 

3 

11 

48 

138 

197 

Total  observed 

447 

502 

437 

1386 

Number  of  hits  692 

Percent  correct  49. 9 

The  equations  tend  to  overspecify  moist  conditions  and  underspecify  dry  con¬ 
ditions  in  both  the  dependent  and  independent  data  samples.  The  percentages  on  inde¬ 
pendent  data  are  slightly  lower  than  those  on  the  dependent  data  sample  (49.  9% 
compared  with  52.  2%).  Because  the  diagnoses  obtained  with  the  REEP  equations 
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are  of  variable  quality,  their  use  in  an  operational  system  should  be  selective,  as 
was  recommended  earlier  [2].  That  is,  the  diagnosed  DPS  should  only  be  used  if 
the  probability  of  occurrence  of  a  given  category  exceeds  a  minimum  value  (0.  50 
for  example).  Using  the  REEP  equations  in  this  manner  will  reduce  the  number  of 
diagnoses  retained  but  they  will  be  of  higher  quality  and  will  result  in  higher  pereent 
correct  diagnoses  scores  than  would  be  obtained  by  using  the  equations  in  every 
ease. 

7.  7 00- mb  Residual  Sample 

The  700-mb  REEP  equations  were  developed  from  a  residual  sample  comprising 
6408  cases  in  the  dependent  sample  and  1569  eases  in  the  independent  sample. 

The  categories  of  the  surface-observed  variables  selected  as  significant 
specifiers  of  the  700-mb  DPS  are  listed  in  Table  XII.  Also  included  are  the  coef¬ 
ficients  of  the  15  selected  specifiers  associated  with  the  3  categories  of  700-mb 
dew-point  spread. 

TABLE  XII 

700-mb  SELECTED  VARIABLES  AND  ASSOCIATED  COEFFICIENTS 


Order 

Selected 

variables 

Range  of  values 

Coefficients  of  REEF 

equations 

Cat.  1 

Cat.  2 

Cat.  3 

1 

CM 

0 

-.  158 

-.  003 

.  161 

2 

W 

0 

-.  121 

.  040 

.  081 

3 

DPS 

0  £  DPS  <;  4 

.  093 

-.  038 

-.  055 

4 

CL 

0 

-.  128 

-.  068 

.  195 

5 

P 

1020  <  P 

-.  153 

.  005 

.  148 

6 

Td 

20  <  Td 

-.  064 

.  112 

-.  048 

7 

DPS 

15  <  DPS 

-.  100 

-.  061 

.  160 

8 

h 

4  <  h  <  6 

-.  093 

-.  063 

.  156 

9 

P 

1010  <  P  5  1020 

-.  086 

.  018 

.  068 

10 

W 

3«Ws5 

-.  127 

.  024 

.  103 

11 

W 

8 

.  104 

-.  087 

-.  017 

12 

FF 

0  <  FF  s  3 

.  033 

.  024 

-.  057 

13 

CH 

2 

-.  017 

.  079 

-.  061 

14 

Nt 

1  s  N't  s  4 

-.043 

.026 

.  016 

15 

CH 

7sCHi9 

-.  110 

.  035 

.  075 

Additive  Constant 

.  578 

.  374 

.  048 
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The  first  two  specifiers  selected  (C  =  0  and  W  =  0)  would,  with  their 
occurrence,  reduce  the  probability  of  moist  conditions  and  increase  the  probability 
of  dry  conditions  at  mid- tropospheric  levels  (700  mb).  This  trend  is  furthered  by 

the  4th  specifier  (C  =0)  and  5th  specifier  (1020  <  P).  Looking  at  these  four  specifier: 

L 

jointly,  one  can  formulate  a  very  logical  synoptic  situation  for  the  warm  season. 
Surface  pressures  in  excess  of  1020  mb  are  most  frequently  associated  with  the 
center  of  air  masses  which  are  predominantly  dry  above  the  subsidence  inversion 
which  often  extends  downward  to  the  surface  in  the  general  area  of  the  high  pressure 
center.  Thus,  there  is  little  or  no  cloudiness  associated  (C  =C^=0)  ^or  VCYl°ds  of 
several  hours  (W^O).  Note  that  the  contributions  of  the  surface  DPS  categories  (3rd 
and  7th  specifiers)  to  the  categorical  probabilities  are  less  than  was  the  case  at 
850  mb  —  as  one  would  expect.  Of  the  15  selected  specifiers,  only  the  past  occur¬ 
rence  of  showers  (W=8)  increases  the  probability  of  the  moist  category  of  700- mb 
DPS  by  more  than  10  percentage  points.  Looking  at  the  list  of  selected  variables, 
it  is  seen  that  the  occurrence  of  most  of  them  would  indicate  dry  conditions  at 
700  mb.  When  most  of  these  variables  do  not  occur  moist  conditions  would  be  the 
case  because  of  the  high  additive  constant  for  the  moist  category  (0.  578). 

The  KEEP  equations  developed  for  the  diagnosis  of  three  categories  of  700- mb 
DPS  were  tested  on  the  dependent  sample  of  6408  cases  and  the  independent  sample 
of  1569  cases.  The  results  are  presented  in  Table  XIII.  The  700- mb  equations 
yield  scores  slightly  lower  than  at  850  mb  and  like  the  results  at  850  mb  were 
lower  on  independent  data  by  a  slight  amount  (47.  9%  compared  with  50. 7%).  The 
700-mb  equations  overspecify  (more  specified  than  observed)  the  dry  category 
while  they  underspecify  the  middle  category  (6  <  DPS  £  14).  Here  again,  using  the 
diagnoses  in  a  selective  manner  (as  suggested  earlier  [2])  based  on  the  probability 
of  occurrence  will  result  in  fewer,  but  more  reliable,  diagnoses. 

8.  500-mb  Residual  Sample 

The  500-mb  REEP  equations  were  developed  from  a  residual  sample  of  5364 
cases  in  the  dependent  sample  and  1361  cases  in  the  independent  sample.  Consistent 
with  the  other  levels,  data  was  set  aside  to  test  the  500-mb  REEP  equations. 


TABLE  XIII 

7 00- mb  RESIDUAL  SAMPLE 
fa)  Dependent- data  specification  of  dew-point  spread 


Observed 

Total 

Specified 

i 

o 

3 

i 

G82  ’ 

417 

207 

13  (if) 

Specified 

2 

280 

552 

372 

1204 

3 

518 

1307 

2013 

3838 

Total  Observed 

1480 

227  0 

2652 

6408 

Number  of 

Hits 

3247 

Percent  Correct  50.  7 

(b)  Independent-data  specification  of  dew-point  spread 


Observed 

Total 

Specified 

1 

2 

3 

Specified 

1 

134 

110 

62 

300 

2 

56 

110 

90 

202 

3 

129 

304 

508 

100] 

Total  Observed 

319 

584 

666 

1569 

Number  of  Hits  752 

Percent  Correct  47.  9 

Table  XIV  lists  the  15  selected  specifiers  with  their  repression  coefficients 
for  the  3  categories  of  500-mb  DPS. 

Quite  logically,  the  presence  or  absence  of  middle  clouds  (C  ^0  or  C  0)  is 

M  M 

the  best  single  discriminator  between  moist  or  dry  conditions  at  500  mb  (and  also 
at  700  mb).  While  categories  of  surface  DPS  were  significant  specifiers  of  850  mb 
humidity  and  of  lesser  importance  at  700  mb,  they  were  not  selected  at  all  as 
specifiers  of  500-mb  DPS.  Low-cloud  types  3  and  9  (cumulonimbus  clouds  of 
considerable  vertical  extent)  make  almost  identical  contributions  to  the  specificand 
categories,  particularly  the  moist  category.  Also  making  a  positive  contribution  to 
the  moist  category  was  the  grouping  of  precipitation  occurrences  (excluding  drizzle) 
(59  ^  ww  <  99).  The  three  surface  temperature  terms  selected  illustrate  an  inter¬ 
esting  association  between  two  seemingly  unrelated  variables. 
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TABLE  XrV 

500- mb  SELECTED  VARIABLES  AND  ASSOCIATED  COEFFICIENTS 


Order 

Selected 

Range  of  values 

Coefficients  of  REEF 

equations 

variables 

Cat.  1 

Cat.  2 

Cat.  3 

1 

CM 

0 

-.  142 

.  002 

.  140 

2 

nt 

5  N,  <  8 

T 

.  098 

-.  067 

-.  031 

3 

Td 

10  <  T  c  20 
d 

-.  003 

-.  080 

.  084 

4 

C  L 

9 

.  182 

-.  049 

-.  133 

5 

T 

20  <  T  <;  30 

-.  089 

-.  123 

.  212 

6 

T 

10  <  T  ■  20 

-.  062 

-.  082 

.  145 

7 

W 

0  s  W  <  1 

-.  074 

.  027 

.  047 

8 

CH 

2 

.  076 

.  034 

-.  109 

9 

T 

30  T 

-.  170 

-.  133 

.  303 

10 

C  L 

3 

.  180 

-.  082 

-.  098 

11 

h 

9 

.  033 

-.  077 

.  044 

12 

ww 

59  s  ww  <  99 

.  161 

-.  062 

-.  098 

13 

Nh 

8 

.  052 

-.  142 

.  090 

14 

CM 

/ 

-.  249 

.  089 

.  160 

15 

CH 

/ 

.  149 

-.  020 

-.  129 

Additive  Constant 

.392 

.  518 

.  090 

These  results  clearly  suggest  that  the  higher  the  surface  temperature,  the  drier 
the  humidity  will  be  at  500  mb  (approximately  18,000  ft).  Note  that  the  probability 
of  the  dry  category  occurring  increases  more  when  the  9th  specifier  (30<  T)  occurs 
than  when  the  5th  (20  <T$  30)  occurs,  which  in  turn  exceeds  the  contribution  of  the 
6th  term  (10  <  T  t  20). 

High  surface  temperatures  will  frequently  occur  in  large  tropical  or  sub¬ 
tropical  high  pressure  air  masses  well  removed  from  frontal  boundaries  and 
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associated  overrunning  moisture.  Further,  these  large  air  mass  systems  are 
frequently  characterized  by  subsidence  at  high  levels,  accounting  for  the  increased 
likelihood  of  dry  conditions  at  500  mb.  Finally,  the  14th  and  15th  specifiers  warrant 
discussion.  The  occurrence  of  =  unknown  when  C  is  known  suggests  cloudy 

conditions  at  or  near  500  mb  and  understandably  results  in  an  increased  likelihood 

of  moist  conditions  at  500  mb.  However,  when  C!  is  unknown  there  must  exist 

M 

a  lower  overcast;  therefore,  C  must  also  be  unknown.  In  assessing  the  contribution 

H 

of  =  unknown  one  must  consider  it  jointly  with  =  unknown,  in  which  case  one 

finds  a  decreased  probability  of  Category  1  (-.  100)  and  increased  probability  of 
categories  2  and  3  (+.  069  and  +.  021  respectively). 

The  500- mb  REEP  equations  were  applied  to  the  dependent  and  independent 
samples  of  5364  and  1361  cases  respectively.  The  results  are  presented  in 
Table  XV.  Unlike  the  850-and  700-mb  equations,  the  REEP  equations  for  500-mb 
DPS  diagnosis  yield  a  distribution  of  diagnoses  very  similar  to  the  observed  distri¬ 
bution.  Further,  the  results  on  independent  data  were  slightly  better  than  the 
dependent  results  (49.  4%  compared  with  48.  5%).  It  is  suggested  that  the  500-mb 
REEP  equations  be  used  selectively  as  suggested  for  the  other  levels.  Again,  by 
doing  this,  there  would  be  fewer  diagnoses  of  higher  quality. 
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TABLE  XV 

500- mb  RESIDUAL  SAMPLE 
(a)  Dependent-data  specification  of  dew-point  spread 


Observed 

Total 

Specified 

1 

2 

3 

Specified 

1 

992 

598 

3G5 

1955 

2 

507 

812 

448 

1767 

3 

353 

492 

797 

1642 

Total  Observed 

1852 

1902 

1610 

5364 

Number  of  Hits 

2001 

Percent  Correct  48.  5 

(b)  Independent- data  specification  of  dew-point  spread 


Observed 

Total 

Specified 

1 

2 

3 

Specified 

i 

264 

137 

92 

493 

2 

127 

177 

113 

417 

3 

97 

123 

231 

451 

Total  Observed 

488 

437 

436 

1361 

Number  of  Hits 

672 

Percent  Correct  49.  4 
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SECTION  V 


INDEPENDENT  DATA  TESTING  AND  RECOMMENDATIONS 


In  the  original  formulation  of  this  study,  the  plan  was  to  develop,  by  objective 
means,  diagnostic  relationships  between  surface-observed  variables  and  the  DPS 
at  850,  700,  500,  and  400  mb  for  both  the  cold  and  the  warm  seasons  of  the  year. 
Cold  season  relationships  had  been  developed  for  the  four  levels  and  reported 
earlier  [2].  Due  to  the  nature  of  the  upper-air  data  available  for  the  development 
of  warm  season  diagnostic  relationships  it  was  felt  that  satisfactory  relationships 
could  not  be  developed  for  400  mb.  The  reasons  for  this  are  discussed  in  Section 
II.  However,  the  requirement  still  existed  for  recommendations  for  means  of 
diagnosing  DPS  at  400  mb  during  the  warm  season.  The  alternatives,  from 
techniques  available,  would  be  (a)  apply  the  400- mb  decision  tree  developed  for  the 
cold  season  to  all  months  of  the  year,  or  (b)  apply  the  500-mb  decision  tree 
developed  for  the  warm  season  to  both  the  500-  and  400-mb  levels  during  the  warm 
months  of  the  year. 

Teletype  data  for  the  period  of  June,  July  and  the  first  part  of  August,  1905 
were  processed  and  evaluated  to  determine  the  better  procedure.  Approximately 
25  U.  S.  stations  routinely  reporting  surface- synoptic  and  upper-air  data  twice- 
daily  were  used  for  the  evaluation. 

Decision-tree  diagnoses  were  obtained  from  the  warm-season  500-mb  decision 
tree  and  compared  with  the  observed  400-mb  DPS.  Similarly,  the  cold-scason  400- 
mb  decision  tree  diagnoses  were  obtained  and  compared  with  the  observed  400-mb 
DPS.  The  results  for  the  two  and  a  half  month  sample  are  summarized  in  Tabic 
XVI.  Diagnoses  made  with  both  of  these  decision  trees  arc  classified  either  moist 
or  dry  while  the  observations  have  been  tabulated  in  three  categories;  moist  (0—5  C), 
marginal  (6°— 14°  C),  and  dry  (>14°  C). 

Comparing  the  diagnoses  made  by  the  two  decision  trees,  one  finds  little 
difference.  The  400-mb  cold-season  decision  tree  yields  slightly  more  reliable 
diagnoses  (having  a  precent  correct  score  2.  2  percentage  points  higher)  but  the 
500-mb  warm-season  decision  tree  yields  more  diagnoses  of  comparable  quality 
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TABLE  XVI 


COMPARISON  OF  TWO  METHODS  FOR  DIAGNOSING 
400-mb  DEW-POINT  SPREAD 


(a)  Warm-season  50D-mb  decision  tree 


Observed 

T  otal 

Moist 

Marginal 

Dry 

Diagnosed 

Diagnosed 

Moist 

134 

191 

82 

407 

Dry 

2 

174 

[452] 

628 

Total  observed 

136 

365 

534 

1035 

Hits  =  586:  %  correct  =  i 

36.  6 

(b)  Cold-season  400-mb  decision  tree 


Observed 

T  otal 
Diagnosed 

Moist 

Marginal 

Dry 

Diagnosed 

Moist 

Dry 

100 

143 

176 

56 

443 

299 

624 

5 

Total  observed 

105 

319 

499 

923 

Hits  =  543;  %  eorreet  =  58.  8 

|  |  Denotes  correet  diagnoses  (hits) 

(1035  diagnoses  compared  to  923).  The  judgement  concerning  the  decision  tree  to 
be  used  to  diagnose  400-mb  DPS  for  the  warm  season  reduces  to  operational  con¬ 
siderations  exclusively  since  there  is  no  justifiable  meteorological  reason  for 
choosing  one  or  the  other. 

A  second  problem  that  was  investigated  concerned  the  justification  of  developing 
relationships  for  separate  seasons  (cold  and  warm).  For  example,  does  the  warm- 
season  500- mb  decision  tree  yield  more  and  better  diagnostic  estimates  of  the  500- 
mb  DPS  than  the  eold-season  500-mb  decision  tree  during  the  summer  months? 

The  warm-season  500-mb  decision  tree  diagnoses  were  compared  with  the  ob— 
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served  500-mb  DPS  and  evaluated  against  the  cold-season  500-mb  decision  tree 
diagnoses  also  compared  with  the  500-mb  DPS.  Table  XVII  summarizes  the  re¬ 
sults  of  this  comparison  conducted  with  the  two  and  a  half  months  of  observations. 
Here  again,  the  diagnoses  are  classified  as  moist  or  dry  and  the  observations  as 
moist,  marginal,  and  dry. 

TABLE  XVII 

COMPARISON  OF  TWO  METHODS  FOR  DIAGNOSING 
500-mb  DEW-POINT  SPREAD 


(a)  Warm-season  500-mb  decision  tree 


Observed 

Total 

Moist 

Marginal 

Dry 

Diagnosed 

Diagnosed 

Moist 

178 

157 

72 

407 

Dry 

2 

151 

1 475 1 

628 

Total  observed 

180 

308 

547 

1035 

Hits  = 

:  653:  %  correct 

=  63.  1 

(b)  Cold-season  500-mb  decision  tree 


Observed 

Total 

Moist 

Marginal 

Dry 

Diagnosed 

Diagnosed 

Moist 

125 

95 

35 

255 

Dry 

5 

128 

340 

473 

Total  observed 

130 

223 

375 

728 

Hits  = 

:  465;  %  correct 

=  63.  9 

I  I  Denotes  correct  diagnoses  (hits) 

The  difference  in  percent  correct  scores  realized  by  the  two  decision  trees  is 
negligible.  However,  the  warm-season  decision  tree  yielded  over  300  more  diagnoses 
of  the  500-mb  DPS  than  did  the  cold-season  decision  tree. 

Comparative  contingency  tables  w^ere  also  obtained  separately  for  June,  July 
and  August  data.  In  all  months,  a  greater  number  of  diagnoses  were  obtained  using 
the  warm-season  decision  tree.  In  July  and  August  the  percent  correct  scores  wrere 
also  higher  but  this  was  not  the  case  in  June.  This  result  is  interesting  in  view  of 
the  fact  that  the  warm-season  decision  tree  was  developed  from  data  largely  limited 
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to  the  months  of  September  and  October.  The  above  result  may  suggest  that  the 
warm-season  decision  trees  should  be  used  in  the  months  from  July  through 
October  or  November,  with  the  cold-season  trees  being  employed  for  the  remainder 
of  the  year.  More  extensive  comparative  testing  with  independent  data  is  required 
to  fully  support  the  above  tentative  conclusion. 

The  reason  for  developing  reliable  diagnostic  estimates  of  DPS  was  to  increase 
the  areal  distribution  of  humidity  information  for  input  to  an  objective  technique 
for  analyzing  humidity.  Therefore,  it  is  recommended  that  the  warm-season  500- 
mb  decision  tree  be  used  during  the  warmer  months  of  the  year  sinee  it  would 
generate  more  diagnoses  per  map  time.  Time  did  not  permit  a  similar  comparison 
at  850  and  700  mb.  However,  since  there  are  unique  differences  justified  by 
meteorological  reasoning  between  the  warm-season  and  cold-season  decision  trees 
developed  for  these  two  levels,  it  is  recommended  that  the  decision  trees  at  850 
and  700  mb  developed  from  warm-season  data  be  used  during  the  warm  season  and 
those  developed  from  cold-season  data  be  used  during  the  cold  season.  As  explained 
earlier,  the  decision  concerning  which  decision  tree  to  use  to  diagnose  400- mb  dew¬ 
point  spread  during  the  summer  season  must  be  made  from  operational  considerations. 
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SECTION  VI 


HUMIDITY  DIAGNOSIS  AND  ANALYSIS  TECHNIQUE 

The  preceding  three  sections  discuss  the  development  and  testing  of  the 
diagnostic  relationships  for  the  warm  season  of  the  year.  The  following  sections 
present  the  results  of  experiments  in  which  diagnostic  data  obtained  from  the  cold- 
season  relationships  [2]  were  used  for  analyses  of  DPS. 

The  data  processing  required  for  this  phase  of  the  study  was  discussed  generally 
in  Section  II  and  shown  in  a  flow  chart  (Fig.  3).  This  section  discusses  in  detail  the 
logic  that  was  required  for  this  data  processing  and  the  developmental  testing  de¬ 
scribed  later.  The  first  three  processing  procedures  perform  data  processing 
and  computations  on  the  three  basic  sets  of  data  (surface,  upper-air  and  CPS).  The 
fourth  and  fifth  combine  and  analyze  these  data  in  various  ways. 

Surface-synoptic  station  data  in  the  European  area  (see  Fig.  2)  were  extracted 
for  the  time  period  00Z  Feb  6  —  12Z  Feb  16,  1962  (22  observation  times).  The 
required  surface  data  were  unpacked  and  checked  to  determine  if  any  of  the  variables 
to  be  used  in  the  diagnostic  relationships  were  missing. 

9.  Humidity  Diagnostic  Procedure 

The  humidity  diagnostic  procedure  generates  diagnostic  estimates  of  the  DPS 
at  850,  700,  and  500  mb.  The  procedure  is  generally  as  follows.  First  the  cold- 
season  decision  trees  developed  for  the  three  levels  [2]  are  applied  to  the  surface 
observations  at  each  station  within  the  developmental  analysis  area.  Second,  if  a 
decision-tree  estimate  can  not  be  obtained  for  a  given  level,  the  REEP  equation 
applicable  to  that  level  [2]  is  then  applied  to  the  surface  observation,  yielding  proba¬ 
bilities  for  the  three  categories  of  DPS  at  the  given  level.  These  probabilities  are 
examined  for  an  f,occurrence,f  diagnosis;  i.  e.  ,  the  probability  of  occurrence  for  a 
given  category  must  equal  or  exceed  a  specified  value  (usually  0.  50).  In  the  absence 
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of  "occurrence"  diagnoses,  the  probabilities  are  examined  for  "non-occurrence" 
diagnoses  in  which  the  probability  of  a  category  must  be  less  than  or  equal  to  another 
specified  value  (usually  0.  10).  5  For  each  case  in  which  a  decision-tree  or  REEP 
diagnosis  is  made,  the  following  information  is  saved  for  later  use:  station  name 
and  location,  the  diagnosed  DPS,  and  its  probability  of  occurrence  (if  the  DPS  is  a 
REFP  diagnosis).  The  probability  of  occurrence  is  obtained  from  the  REEP  equation, 
if  used;  if  a  decision-tree  diagnosis  was  made,  an  indicator  of  1.0  is  used*  This 
procedure  required  options  that  would  govern  the  levels  to  be  processed  and  the 
types  of  diagnostic  estimates  to  be  attempted  for  any  given  experiment:  that  is, 
one  experiment  might  consider  only  decision-tree  diagnoses  while  another  might 
consider  decision-tree  and  REEP  "occurrence"  diagnoses  only. 

Applying  the  decision  trees  and  three-category  REEP  equations  requires  cer¬ 
tain  assumptions;  i.  e.  ,  in  the  500-mb  decision  tree  a  DPS  value  of  20°  C  is  used  for 
cases  in  which  "motorboating"  is  diagnosed.  The  diagnosed  value  of  DPS  used  for 
non-motorboating  branches  of  the  500-mb  decision  tree  and  for  the  850-  and  700-mb 
decision  trees  were  those  suggested  in  [2]  .  Further  assumptions  in  tiie  application 
of  REFP  equations  are  summarized  in  Table  XVIII.  The  three  categories  of  DPS 
represent  a  range  of  values  having  categorical  limits,  as  defined  in  Table  XVIII, 
but  a  single  numerical  value  must  be  used  to  represent  the  categories  when  a 
diagnosis  is  made.  Table  XVIII  includes  two  values  for  each  category;  one  used 
when  an  "occurrence"  diagnosis  is  made  and  the  other  used  when  a  "non-occurrence" 
diagnosis  is  made.  A  limiting  value  of  DPS  is  assigned  to  categories  1  and  3.  For 
example,  at  the  700-mb  level,  if  category  1  is  diagnosed  to  not  occur,  the  limiting 
value  indicates  that  the  DPS  is  diagnosed  to  be  at  least  7°  C.  Since  a  non-occurrence 
diagnosis  of  categoiy  2  does  not  give  specific  information  about  DPS,  non-occurrence 
diagnoses  are  not  made  for  this  category.  It  should  be  remembered  that  an  occur¬ 
rence  or  non-occurrence  diagnosis  is  made  only  if  a  decision-tree  diagnosis  is  not 
possible.  The  specified  probability  to  be  used  for  the  acceptance  of  "occurrence" 
or  "non-occurrence"  diagnoses  is  also  included.  Finally,  the  REEP  equation  can 
be  applied  only  if  all  the  surface  variables  includ  d  in  the  equation  are  present  in  the 

5 

Such  a  categoiy  would,  therefore,  have  a  high  probability  of  not  occurring. 
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TABLE  XVIII 

SUMMARY  OF  INFORMATION  REQUIRED  FOR  THE  USE  OF  REEP  EQUATIONS 
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surface  observation.  The  specific  variables  needed  for  the  cold-season  REEP 
equations  are  also  listed  in  this  table. 

10.  Radiosonde  Extraction  and  Error  Checking 

The  radiosonde  data  at  stations  in  the  developmental  area  were  extracted. 

The  DPS  is  computed  at  the  850-,  700-,  and  500-mb  levels  from  the  temperature 
and  dew-point  observations.  When  the  temperature  and  height  are  reported  and 
the  dew  point  is  missing,  motorboating  is  assumed  and  a  DPS  value  of  20’ C  is  in¬ 
serted  if  the  temperature  is  greater  than  -40°  C.  At  each  level,  each  station  re¬ 
port  of  DPS  is  compared  with  an  average  value  of  DPS  computed  at  a  minimum  of 
8  stations  in  the  vicinity.  The  error-checking  procedure  requires  that  the  differ¬ 
ence  between  the  DPS  value  being  checked  and  the  average  DPS  jll]  be  no  more  than 
1 5°  C.  If  the  station  report  does  not  satisfy  that  requirement  it  is  discarded.  In 
this  manner  erroneous  or  unrepresentative  DPS  values  are  eliminated.  Table  XIX 
lists  the  total  number  of  radiosonde  reports  available  at  each  level  within  the  area 
used  in  the  developmental  testing  after  error  checking  was  completed  for  each  time 
period  of  the  data  sample.  The  number  of  station  reports  that  were  eliminated 
because  of  missing  or  erroneous  data  was  low.  In  an  average  observation  time 
about  2  stations  were  eliminated  because  the  temperature  was  missing  or  the  com¬ 
puted  DPS  was  negative.  Additionally,  about  2  stations  were  discarded  because 
the  DPS  did  not  satisfy  the  error  checking  requirement  of  15°  C.  The  average 
number  of  RAOBS  accepted  by  this  program  was  61  at  850  mb  and  63  at  the  two 
higher  levels. 

11.  Humidity  Preprocessing 

Radiosonde  and  diagnostic  DPS  data  are  merged  and  processed.  Data  at  each 
level  (850,  700  or  500  mb)  are  processed  separately.  The  processing  of  the  radio¬ 
sonde  and  diagnostic  data  will  be  discussed  separately  in  the  following  paragraphs. 

The  procedure  utilizes  features  of  an  earlier  preprocessing  procedure  designed 
by  Thomasell  and  Welsh  [l3]  . 

^Original  specifications  by  Frederick  P.  Ostby,  Jr. 
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TABLE  XIX 

NUMBER  OF  RADIOSONDE  OBSERVATIONS  AVAILABLE  AFTER 

ERROR  CHECKING 


Date 

Time 

(Z) 

Level  (mb) 

(February) 

850 

700 

500 

G 

00 

GO 

64 

64 

12 

69 

71 

72 

17 

00 

49 

52 

51 

< 

12 

38 

39 

39 

8 

00 

71 

71 

71 

12 

69 

72 

73 

Q 

00 

52 

56 

56 

0 

12 

42 

44 

43 

10 

00 

72 

78 

76 

12 

67 

67 

70 

11 

00 

67 

70 

72 

12 

68 

70 

68 

12 

00 

59 

62 

61 

12 

56 

61 

58 

13 

00 

61 

65 

66 

12 

65 

68 

69 

14 

00 

72 

76 

72 

12 

61 

63 

61 

15 

00 

60 

60 

59 

12 

57 

57 

55 

16 

00 

65 

69 

69 

12 

62 

65 

65 
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In  the  processing  of  the  radiosonde  data  the  first  step  is  to  withhold  a  portion 
(controled  by  input)  of  the  data  from  further  processing.  By  withholding  va lying 
percentages  of  the  radiosonde  data,  one  can  simulate  regions  where  data  are  less 
dense. 

The  density  of  station  reports  about  each  grid  point  is  then  computed  (i.  e.  ,  a 
count  is  made  of  the  number  of  reporting  stations  in  a  fixed  area  about  each  grid 
point).  A  station  density  is  computed  for  all  stations  by  performing  a  curvilinear 
interpolation  using  the  densities  at  the  four  grid  points  surrounding  each  station. 

For  each  radiosonde  station  processed,  the  following  is  output:  station  name 
and  location,  radiosonde  observation  indicator,  station  density,  and  DPS. 

In  processing  the  diagnostic  data,  the  first  step  is  to  eliminate  a  given  percen¬ 
tage  of  the  diagnoses  to  stimulate  the  required  data  density  to  be  used  in  the  analysis. 
In  addition,  those  stations  at  the  same  location  as  the  radiosonde  stations  arc  not 
used. 

The  diagnostic  probability  listed  with  all  diagnoses  to  be  processed  is  examined 
next.  All  station  reports  having  a  probability  greater  than  an  input  critical  proba¬ 
bility  are  selected  and  grouped  in  an  occurrence  diagnostic  list.  Those  reports 
having  a  probability  less  than  a  second  critical  probability  are  selected  for  a  non- 
occurrence  diagnostic  list.  The  two  critical  probabilities  can  vary  and,  therefore, 
the  minimum  reliability  of  the  diagnostic  data  that  is  processed  can  also  vary. 

The  density  of  station  reports  in  the  region  about  each  processed  occurrence 
diagnosis  is  computed  in  a  procedure  similar  to  that  used  for  the  radiosonde  stations. 
The  diagnostic  station  density  includes  both  radiosonde  stations  and  stations  con¬ 
taining  diagnosed  DPS. 

The  occurrence-diagnostic  and  radiosonde  data  are  grouped  together  with  the 
following  information  being  generated:  station  name  and  location,  reliability  indi¬ 
cator  (REFP  probability  or  indication  as  to  whether  DPS  value  is  obtained  from  a 
radiosonde  observation  or  decision  tree),  station  density  and  DPS.  The  non-occur¬ 
rence  diagnostic  data  are  generated  separately  and  contain  the  same  information 
except  that  there  is  no  station  density. 
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12.  CPS  Extraction  and  Conversion 


CPS  grid-point  data  are  extracted  for  the  required  analysis  area  and  observa¬ 
tion  times.  The  grid  field  is  reordered  to  be  consistent  with  other  data  and  the  CPS 
values  are  converted  to  approximate  values  of  DPS  by  multiplying  by  the  constants 
given  in  Table  XX. 

TABLE  XX 

CONSTANTS  USED  TO  CONVERT  CPS  TO  DPS* 


Level  (mb) 

850 

700 

1 

VI  1 

o 

o 

_ 1 

Constant 

-  1/12.2 

-  1  /10.  0 

-  1  /?.  f, 

*Supplied  by  USAF  Air  Weather  Service 


The  grid-point  fields  of  DPS  can  then  be  used  as  an  initial  guess  for  the  DPS  succes¬ 
sive  approximation  technique  (SAT)  analysis.  On  option,  these  fields  may  be  modi¬ 
fied  with  non-occurrence  diagnostic  data  prior  to  analysis.  The  details  of  this  pro¬ 
cedure  are  discussed  in  the  next  section. 

13.  SAT  Humidity  Analysis 

Prior  to  performing  a  successive  approximation  technique  (SAT)  analysis  of 
DPS,  an  initial-guess  field  must  be  obtained.  The  initial  guess  consists  of  values 
of  DPS  at  all  grid  points  in  the  analysis  area  (the  grid  spacing  used  is  that  of  the 
NWP  grid)  and  serves  as  a  first  approximation  of  the  DPS  distribution.  The  initial 
guess  may  be  obtained  from  CPS  and  (on  option)  non-occurrence  diagnostic  data.  A 
SAT  analysis  is  then  performed  in  the  anlaysis  area  using  analysis  stations  (occur¬ 
rence-diagnostic  data  and  radiosonde  stations  not  withheld).  In  this  analysis,  suc¬ 
cessive  corrections  are  made  to  the  initial-guess  values  at  the  grid  points  based  on 
station  reports  within  a  radius  of  influence  of  the  grid  point.  The  extent  of  the  cor¬ 
rection  made  is  a  function  of  (a)  the  difference  between  the  station  value  of  DPS  and 
the  value  obtained  at  the  station  location  by  interpolating  from  the  4  surrounding 
grid  points,  (b)  the  distance  of  the  station  from  the  corrected  grid  point  and  (c)  the 
reliability  of  the  station  data.  The  SAT  analysis  technique  was  developed  at  the 


61 


Joint  Numerical  Weather  Prediction  Unit  at  Suitland,  Md.  [3]  and  adopted  at  TRC 
[4]  and  [l3]  as  a  basic  analysis  technique  for  various  meteorological  parameters. 

A  more  detailed  description  of  the  SAT  analysis  technique  and  its  application  to  the 
analysis  of  DPS  is  given  in  the  following  paragraphs. 

The  initial  guess  to  be  used  in  the  analysis  can  be  obtained  by  three  tech¬ 
niques:  (a)  initial-guess  dew -point  spread  (IGDPS)  data  (obtained  from  CPS  field)  un¬ 
modified  (b)  IGDPS  data  modified  by  non-occurrence  diagnostic  information  and  (e) 
averaging  observations  about  a  grid  point.  In  the  third  technique,  the  initial-guess 
value  assigned  to  each  grid  point  consists  of  the  average  DPS  found  at  the  stations 
closest  to  the  grid  point.  This  technique  provides  a  suitable  initial  guess  in  regions 
where  data  are  not  sparse.  It  is,  however,  obviously  not  suitable  for  large  oceanic 
regions  of  the  Northern  Hemisphere  where  radiosonde  stations  may  be  many  hun¬ 
dreds  of  miles  apart. 

In  the  first  and  second  techniques,  the  12-hr  forecasts  of  CPS  which  have  been 
converted  to  DPS  are  utilized  for  the  initial  guess.  The  IGDPS  data  are  unmodified 
in  tit e  first  technique.  In  the  second  technique  these  data  are  modified  w  ith  non- 
occurrence  diagnostic  information.  If  a  non-occurrence  diagnosis  is  found  within 
an  area  (the  size  of  which  is  specified  by  input)  centered  at  a  grid  point,  the  grid- 
point  value  of  DPS  is  checked  to  see  if  it  is  within  the  range  of  the  category  diagnosed 
not  to  occur.  If  the  value  is  within  the  limits  of  tills  category,  it  is  adjusted  to  ex¬ 
ceed  the  upper  limit  (category  1  non-occurrence  diagnosis)  or  be  less  than  the  lowrer 
limit  (category  3  non-occurrence  diagnosis)  of  the  categoiy  diagnosed  not  to  occur. 

After  the  initial-guess  field  has  been  established  by  one  of  the  above  tech¬ 
niques,  a  successive  approximation  technique  (SAT)  analysis  is  performed.  Much 
of  the  following  description  of  the  SAT  analysis  is  taken  from  Davis  [b].  For  each 
analysis  station  (including  diagnostic  stations,  when  used),  an  interpolated  value  of 
the  initial-guess  field  DPS  is  computed  at  the  station  location  by  fitting  a  curvilinear- 
surface  to  the  four  surrounding  grid  points. 

The  interpolation  equation  is 


S  =  ch 

i,  j 


+ 

1 


r  Ai  +  SA  j  +  t  A  i  A  j, 


(VI-1) 
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where  S  is  the  interpolated  value  of  DPS  at  a  station  (i,  j)  and  r  =  d9  -  c4^,  S  = 

0-0,  and  t  =  -  <i>  +  <t>  -  c 4,.  The  values  of  <0  are  the  initial-guess  values  of 

4  112  3  4 

DPS  and  Ai  and  Aj  are  component  distances  to  the  analysis  station  as  shown  in  Fig. 

19. 


After  the  interpolation  has  been  performed,  the  interpolated  value  S. .  of  DPS 
is  compared  with  the  station  value  d  ^  of  DPS  and  the  difference  is  computed  as 


e. . 
ij 


6  ST  A 


(VI-2) 


The  magnitude  of  e.  .  reflects  the  error  in  the  initial  guess  at  the  station  location. 

1  5  J 

The  error  differences  at  all  stations  within  a  radius  R  of  the  grid  point  are  used  to 
correct  the  initial-guess  value  at  the  grid  point.  The  corrections  are  computed  by 
the  equation. 


C.  =  n-1  >  RWM  •  W  •  e.  (VI-3) 

Dj  /  j  i.Jt 


i 


Fig.  19.  Values  required  for  computing  a  SAT  eorreetion  for 
grid  point  (2,2)  from  Davis  (4). 
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where  n  is  the  number  of  stations  for  which  errors  can  be  computed  within  a  radius 
R,  and  W  is  a  weighting  function  for  each  station 


V\ 


9  9 

R_  -  d_ 

9  +  9 

Rw  cl 


(VI -4) 


with  R  the  radius  of  search  about  the  grid  point  and  d  the  distance  of  the  station  from 
the  grid  point.  The  RWM  term  of  Fq.  (VI-3)  is  a  relative  weighting  matrix  correc¬ 
tion  and  is  applied  to  each  station.  The  RWM  correction  is  bounded  by  >0  and  1. 

The  value  to  be  used  for  a  particular  station  report  is  determined  by  the  station  den¬ 
sity  (p)  and  the  reliability  (RI)  of  the  data.  The  values  of  RWM  to  use  in  the  analysis 
are  supplied  by  input  from  apx  RI  matrix.  A  further  discussion  of  this  term  is 
given  in  the  description  of  the  results  of  the  developmental  testing  found  in  the  next 
section. 

The  value  of  C_  computed  from  Fq.  (VI-3)  is  added  to  the  original  value  of 
DPS  at  the  grid  point.  This  procedure  is  applied  to  all  grid  points  in  the  analysis 
area.  For  the  first  pass,  the  initial  guess  is  corrected.  The  basic  procedure  of 
interpolating,  obtaining  errors,  and  making  corrections  is  repeated  for  the  number 
of  passes  specified  by  input.  Normally,  the  magnitude  of  R  is  reduced  for  each  suc¬ 
ceeding  pass.  Smoothing  and  verification  are  possible  between  passes  and  after  the 
final  pass. 
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SECTION  VII 


TESTING  THE  HUMIDITY  ANALYSIS  TECHNIQUE 


The  developmental  testing  of  the  humidity  analysis  technique  (Section  VI)  was 
conducted  primarily  to  answer  the  following  three  basic  questions: 

(a)  When  should  humidity  diagnostic  data  be  introduced  into  an  objec¬ 
tive  analysis  for  which  RAOB  data  are  also  available? 

(b)  How  should  the  diagnostic  data  be  incorporated  into  the  analysis? 

(c)  What  are  the  effects  on  the  analysis  of  introducing  the  diagnostic 

data  ? 

The  first  question  is  concerned  both  with  the  density  of  RAOB  data  available 
and  with  the  reliability  of  the  diagnostic  information.  The  second  question  is 
answered  by  determining  the  relative  weight  to  be  given  to  the  diagnostic  data  as 
well  as  by  examining  certain  features  of  the  SAT  analysis  procedure,  such  as  the 
number  of  passes  to  make  in  the  analysis,  the  size  of  the  search  radii  to  use,  and 
the  degree  of  smoothing  to  employ.  The  testing  was  largely  concerned  with  weighting 
the  diagnostic  data.  The  third  question  is  concerned  not  only  with  whether  an  im¬ 
provement  in  the  analysis  can  be  noted  by  examining  verification  statistics,  but  also 
with  any  changes  in  the  analysis  characteristics  (for  example,  the  use  of  diagnostic 
data  may  lower  or  raise  the  humidity  content  as  analyzed  at  grid  points  and  may  also 
modify  the  scale  of  the  features  of  the  humidity  field  that  arc  included  in  the  analyses). 

14,  Verification  Procedures 

The  test  analysis  and  verification  areas  are  shown  in  Fig.  2.  The  two  types  of 
statistics  are: 

(a)  Root  mean -square  (rms)  error  at  grid  points  and, 

(b)  contingency  table  verification  of  categories  of  DPS  at  grid  points. 

These  statistics  are  available  for  the  initial  guess  and  after  each  pass  of  every  map. 
Overall  statistics  are  also  given  for  all  22  observation  times. 
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The  rms  error  and  contingency  table  statistics  at  grid  points,  for  a  given 
experiment,  were  obtained  by  comparing  the  analyzed  DPS  values  at  each  grid 
point  in  the  verification  area  with  the  verification  dew-point  spread  analysis  that 
was  obtained  from  a  previous  analysis  using  all  available  RAOBS.  Since  the  density 
of  RAOBS  in  Europe  is  extremely  high,  the  verification  analysis  gives  a  highly  re¬ 
liable  representation  of  the  moisture  field.  The  verification  area  was  reduced  from 
the  borders  of  the  analysis  by  one  NWP  grid  unit  to  eliminate  any  possible  distor¬ 
tions  in  the  analysis  which  might  occur  on  the  edge  of  the  analysis  area. 

15.  Data  Characteristics 

Table  XXI  gives,  for  each  level,  the  total  number  of  RAOBS  for  all  22  obser¬ 
vation  times  having  a  DPS  in  an  indicated  range.  The  values  of  DPS  shown  in 
Table  XXI  represent  the  upper  limit  (except  the  seventh  category,  DPS  -  22).  Note 
that  very  moist  conditions  prevailed  through  much  of  the  period  at  the  850 -mb  level 
(over  half  the  stations  reported  a  DPS  of  less  than  4°  C)  while  moisture  conditions 
varied  at  700  and  500  mb. 


TABLE  XXI 

DPS  FREQUENCY  FOR  DEVELOPMENTAL  SAMPLE 


Level  (mb) 

Category  limit  (°C) 

Total 

<2 

<4 

<7 

<11 

<16 

<22 

A22 

850 

517 

303 

250 

120 

82 

66 

4 

1342 

700 

333 

231 

270 

220 

176 

150 

19 

1399 

500 

156 

294 

297 

268 

164 

204 

13 

1396 

The  average  number  of  decision-tree  and  REEP  diagnoses  at  each  observa¬ 
tion  time,  for  the  three  levels,  is  shown  in  Table  XXII.  The  table  showrs  the  aver¬ 
age  number  of  stations  processed  per  hour,  for  the  22  observation  times,  and  the 
average  number  of  diagnoses  made  per  observation  time.  To  simulate  regions  of 
varying  data  densities,  only  a  fraction  of  the  radiosonde  and  diagnostic  data  is 
used  in  the  analysis. 
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TABLE  XXII 

AVERAGE  NUMBER  OF  DIAGNOSES  PER  OBSERVATION  TIME 


Level 

(mb) 

Stations 

processed 

Decision 

tree 

REEP 

occurrence 

REEP 

non-occurrence 

850 

356 

252 

25 

11 

700 

356 

154 

56 

24 

500 

356 

_ _ _ 1 

125 

32 

12 

Decision-tree  diagnoses  completely  predominate  at  the  850-mb  level  (in  fact, 
they  are  made  at  about  70  percent  of  the  stations  processed).  At  700  and  500  mb, 
the  number  of  REEP  diagnoses  is  somewhat  larger,  and  because  there  arc  much 
fewer  decision-tree  diagnoses  at  these  levels,  the  relative  importance  of  the  REEP 
diagnoses  is  increased. 

Table  XXIII  shows  moisture  characteristics  (moist  versus  dry)  of  the  tw'o  types 
of  diagnostic  information  at  the  three  levels.  The  characteristics  and  frequency  of 
diagnostic  data  will  strongly  affect  the  analysis.  Some  of  the  effects  of  the  material 
contained  in  these  three  tables  will  be  seen  in  Subsection  18,  Results. 


TABLE  XXRI 

MOISTURE  CHARACTERISTICS  OF  DIAGNOSTIC  DATA 


Level  (mb) 

Decision  tree 

REEP 

850 

Moist  only 

Dry  and  Moist 

700 

Moist  only 

Moist  and  Dry 

500 

Moist 

Dry 

predominately 

predominately 

16.  Data  Density  Simulation 


Because  the  basic  purpose  of  the  analysis  testing  was  to  evaluate  the  results 
of  humidity  diagnoses  in  data-sparse  regions,  procedures  were  formulated  to  simu¬ 
late  data  densities  characteristic  of  these  regions.  The  simulation  of  data-sparsc 
regions  is  somewhat  complicated  by  the  fact  that,  although  RAOBS  are  reported  by 
approximately  100  stations  in  the  test  area  at  least  once  during  the  22-observation 
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time  interval  of  the  test  period,  the  average  number  of  RAOBS  reported  for  any 
one  observation  time  is  about  65.  Withheld  station  lists  of  90,  75  and  50  stations 
were  systematically  compiled  from  the  100  possible  stations.  The  number  of 
RAOB  stations  withheld  approximates  the  percentage  of  radiosonde  stations  with¬ 
held  from  the  analysis.  Table  XXIV  gives  the  approximate  percentage  of  radiosonde 
and  diagnostic  data  used  in  the  low,  medium,  and  high  data-density  simulations. 
Figures  20  and  21  show  the  distribution  of  stations  reporting  700-mb  DPS  for  low 
and  medium  data  densities  at  00Z,  Feb.  11,  1962.  The  distribution  ehanges  with 
each  observation  time.  An  average  of  6  RAOBS  per  observation  time  were  included 
in  the  analysis  for  the  sparse  (low)  data  density  simulation;  averages  of  16  and  31 
were  included  in  analyses  for  intermediate  (medium)  and  dense  (high)  data-density 
simulations,  respectively.  Beeause  the  test  area  is  approximately  the  size  of  the 
United  States,  it  is  felt  that  these  density  types  are  representative  for  the  Northern 
Hemisphere.  More  surfaee  station  data  than  upper-air  data  were  included  in  the 
simulation  to  better  represent  the  data-sparse  regions,  such  as  the  many  ocean 
areas,  in  which  there  are  a  large  number  of  ships  reporting  surfaee- synoptic  data, 
while  very  few  upper-air  soundings  are  taken. 


TABLE  XXIV 

INFORMATION  AVAILABLE  FOR  DATA-DENSITY 
SIMULATION  EXPERIMENTS 


Data-density  type 

Percentage  of  RAOB  data 

Percentage  of  diagnostic  data 

Low 

10 

20 

Medium 

25 

33 

High 

50 

Not  used 

17.  Experimental  Design 

The  principle  faetors  listed  below  were  given  careful  consideration. 

(a)  Level  The  relative  frequency  of  moist  and  dry  observations  and 
diagnoses  are  considerably  different  at  the  three  levels  (see  Subseetion  15).  Thus 
the  impact  of  the  diagnostic  data  will  vary  accordingly. 
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Fig.  20.  Distribution  of  RAOBS  at  700  mb  on  February  11,  1962  for  low-data-density  simulation  (10%  of  RAOBS 
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Fig.  21.  Distribution  of  RAOBS  at  700  mb  on  February  11,  1962  for  medium-data-density  simulation  (25%  of 
RAOBS  used). 


(b)  Data  Density  The  extent  of  diagnostic  data  available  and  the  rela¬ 
tive  weight  to  give  the  diagnoses  is  a  function  of  the  density  region  being  simulated 
and  the  variations  of  data  densities  within  the  test  area. 

(c)  Initial  Guess  and  Test  Area  The  first-guess  field  used  in  the  present 
GWC  humidity  analysis  (prior  to  modification)  consists  of  the  converted  12-hr  CPS 
trajectory  forecasts  (called  IGDPS  in  this  report).  In  most  of  the  experiments 
conducted,  therefore,  the  IGDPS  field  was  used  as  the  initial  guess.  However,  the 
quality  of  this  initial  guess  can  vary  significantly  within  the  test  area.  Because 
most  of  the  IGDPS  values  in  the  western  portion  of  the  test  area  (Western  Europe) 
are  derived  from  trajectories  that  originated  over  the  Atlantic  Ocean,  this  portion 
of  the  field  is  expected  to  contain  larger  errors.  That  this  was  indeed  the  case  will 
be  seen  in  the  results.  Therefore,  some  experiments  were  performed  using  data 
within  the  western  half  of  the  analysis  area  only.  In  other  experimentation  the 
IGDPS  field  was  modified  with  non-occurrence  diagnostic  data  prior  to  the  analysis. 
In  further  experiments  an  initial-guess  field  was  generated  by  an  averaging  pro¬ 
cedure  from  RAOBS  and  occurrence-diagnostic  data. 

(d)  Relative  Weighting  Matrix  The  relative  weighting  matrix  determines 
the  weight  given  to  the  correction  made  by  an  individual  humidity  diagnosis  in  the 
SAT  analysis  [see  Eq.  (VI-3)]  relative  to  the  weight  given  to  the  RAOB  (always  1). 
When  a  humidity  diagnosis  is  used  in  the  analysis,  the  data  weight  correction  is 
bounded  by  zero  and  one.  With  a  weight  of  one,  the  humidity  diagnosis  correction 
is  weighted  equivilent  to  a  RAOB.  Within  the  indicated  range,  the  numerical  weight 
of  the  correction  is  provided  to  the  analysis  from  a  table  typified  by  Tabic  XXV. 

The  table  is  a  5  times  5  matrix,  in  which  each  of  the  25  possible  values  is  defined 
uniquely  by  the  reliability  RI  of  the  diagnosed  or  observed  data  and  the  density  of 
stations  q  about  the  surface  station  or  radiosonde  station  that  is  providing  the 
humidity  data.  The  values  of  the  RI  and  p  categories  are  also  provided  to  the 
analysis  program  and  these  values  are  the  lower  limit  of  the  category.  Table 
XXVI  gives  the  category  values  used  and  their  significance. 
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TABLE  XXV 

RELATIVE  WEIGHTING  MATRIX  (RWM) 


Density 

Reliability  category  (RI) 

categoi’y 

(P) 

r  m,  i 

Rh 

•  •  • 

RL 

P1 

RWMn 

RWM 

12 

RWM 

15 

P2 

RWM 

Li  X. 

RWM„„ 

22 

RWM-. 

25 

P5 

RWM  , 

5 1 

RWM_„ 

52 

•  •  • 

RWM__ 

55 

TABLE  XXVI 

CATEGORY  LIMITS  OF  RELATIVE  WEIGHTING  MATRIX 


(a) 

Reliability  indicator  (RI) 


Value 

Explanation 

.5 

REEP  diagnosis  -  RI^ 

.  5  £  Pr*  <  .  G 

.  6 

REEP  diagnosis  -  RIo 

.  6  ^  Pr*  <  .  7 

.7 

REEP  diagnosis  -  RI^ 

.7  <  Pr* 

1.0 

Indicator  for  decision- 
tree  diagnosis  -  RI^ 

2.0 

Indicator  for  RAOB-RI 

5 

_ 1 

*Pr  =  probability  of  occurrence 


(b) 

Station  densities  (p) 


Value 

Explanation 

0 

Number  of  stations  in  area 

0  ^  Pj  <  2 

2 

Number  of  stations  in  area 

2  s  f>2<  4 

4 

Number  of  stations  in  area 

45  p3<7 

7 

Number  of  stations  in  area 

7  s  p4<10 

10 

Number  of  stations  in  area 

1°  s  p. 
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As  indicated  in  Table  XXVI,  there  are  5  categories  each  of  reliability  and  of 
density.  All  pieces  of  information  used  in  the  analysis  will  fall  into  one  category 
of  RI  and  p  and  thus  the  value  of  RWM  is  determined.  RI  is  divided  into  3  cate¬ 
gories  of  increasing  probability  of  occurrence  for  the  REEP  diagnoses  and  one 
category  each  for  decision-tree  diagnoses  and  RAOBS.  The  method  of  computing 
densities  is  described  in  Subsection  11,  but  in  general,  the  numerical  value  indi¬ 
cates  approximately  the  number  of  RAOBS  and  humidity  diagnoses  in  a  square  of 
2x2  N\VP  grid  intervals  (about  400  miles  in  the  test  area).  Thus,  less  than  2 
observations  and  diagnoses  indicates  very  limited  humidity  data  in  the  local  area, 
while  10  or  more  observations  and  diagnoses  indicates  that  a  large  amount  of  humidity 
data  is  available  in  the  limited  region.  Various  approaches  may  be  tested  for 
weighting  the  diagnostic  corrections  as  a  function  of  diagnosis  reliability  and 
amount  of  other  data  available.  A  similar  procedure  has  been  used  in  the  analysis 
of  10-mb  heights  and  temperature  [l2],  the  main  difference  being  that  the  weighting 
used  was  a  function  of  data  timeliness  (past  data  were  used)  and  of  data  density. 


(c)  Analysis  Characteristics  The  analysis  options  which  will  strongly 
influence  the  final  humidity  analysis  are:  the  number  of  SAT  corrections  used; 
the  size  of  the  influence  radius  (R)  which  defines  the  area  about  each  grid  point 
within  which  data  are  used  to  correct  the  grid  point  DPS;  and  the  degree  of  smoothing 
(if  any)  used  between  each  correction. 


It  has  been  demonstrated  [13]  that  the  size  of  the  influence  radii  will  strongly 
influence  the  scale  of  the  parameter  features  that  predominate  in  the  analysis. 
Smoothing  can  be  used  in  the  following  form  [l3]  : 


S(i,j) 


DPS(i,  j)  +  b  DPS 
1  +  b 


(VII- 1) 


where  S(i,j)  is  the  resultant  smoothed  DPS  at  the  grid  point,  DPS(i,j)  is  the  un¬ 
smoothed  value  at  the  grid  point,  DPS  is  the  mean  value  of  dew-point  spread  at  the 
4  surrounding  grid  points  and  b  is  a  constant  that  determines  the  degree  of  smoothing 
(the  value  of  b  is  a  variable  input  to  the  analysis).  If  b  =  0  no  smoothing  is  used;  if 
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b  =  1  the  mean  value  of  DPS  at  the  4  surrounding  grid  points  is  weighted  equal  to 
the  unsmoothed  value  of  DPS  at  the  central  grid  point.  When  relatively  strong 
smoothing  (such  as  b=l)  is  used,  small  scale  irregularities  (as  well  as  real  fea¬ 
tures)  tend  to  be  surpressed  [l3]  . 

Most  of  the  experiments  described  in  the  next  section  arc  concerned  directly 
with  the  incorporation  of  the  diagnostic  data  into  the  analysis  in  a  variety  of  ways. 

In  these  experiments  it  was  necessary  to  compare  identical  SAT  analysis  proced¬ 
ures  to  evaluate  the  impact  of  the  diagnostic  data  on  the  analysis.  However,  a  limi¬ 
ted  number  of  experiments  were  conducted  in  which  differing  SAT  correction  pro¬ 
cedures  and  (in  particular)  smoothing  were  used. 

18.  Results 

In  each  of  the  experiments  performed,  a  comparison  was  made  of  the  ana¬ 
lyzed  values  of  DPS  at  grid  points  with  the  verification  grid  field,  in  the  form  of  a 
5x5  contingency  table  with  categories  of  3°C  for  850  mb,  4°C  for  700  mb  and  5°C 
for  500  mb.  The  resultant  category  limits  shown  in  Table  XXVII  were  arbitrarily 
chosen,  and  reflect  the  differing  moisture  conditions  at  the  three  levels.  Table 
XXVHI  is  an  example  of  a  contingency  table  showing  the  results  of  an  experiment 
with  850-mb  data  in  which:  only  RAOBS  are  used;  the  initial  guess  is  derived  from 
the  CPS  data;  the  data  density  is  high;  and  smoothing  (b=0.  1)  is  performed.  The 
table  is  a  composite  of  the  individual  contingency  tables  resulting  after  the  final  SAT 
correction  for  each  of  the  22  observation  times.  The  percent  correct  (in  this  case 
53.9%)  is  the  statistic  shown  in  the  following  tables  in  this  section.  The  corres¬ 
ponding  rms  errors  shown  in  these  tables  are  the  overall  rms  errors  at  all  grid 
points  within  the  verification  area  for  all  22  observation  times. 

TABLE  XXVII 

CONTINGENCY  TABLE  LIMITS  FOR  850-,  700-,  AND  500-mb  DPS 


Level 

Category  Limits  (‘ 

’C) 

1 

2 

3 

4 

5 

850 

0  -  <3 

3 

-  <6 

6  -  <9 

9  -  <12 

12 

700 

0  -  <4 

4 

-  <8 

8  -  <12 

12  -  <16 

a  16 

500 

0  -  <5 

5 

-  <10 

10  -  <15 

15  -  <20 

s  20 
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TABLE  XXVIII 

CONTINGENCY  TABLE  EXAMPLE 


Verification  Analysis 

Total 

i 

2 

3 

4 

5 

1 

455 

119 

24 

12 

10 

620 

o 

Q) 

N 

2 

182 

205 

60 

21 

10 

478 

>> 

3 

28 

78 

67 

20 

10 

203 

cj 

5 

4 

12 

20 

25 

25 

8 

90 

5 

2 

3 

6 

16 

27 

54 

Total 

679 

425 

182 

94 

65 

1445 

Hits  =  779  Percent  correct  =  53.  9 


Hits  ±1  category  =  1287  Percent  correct  =  89. 1 


Experiments  were  performed,  for  each  of  the  3  types  of  data  densities,  in 
which  only  radiosonde  observations  were  used.  In  all  these  experiments  the  initial 
guess  consisted  of  the  IGDPS  field  (12-hr  trajectory  forecasts  of  CPS  converted  to 
DPS);  three  SAT  corrections  having  influence  radii  of  2.0,  1.5,  and  1.0  were  per¬ 
formed,  and  smoothing,  ranging  from  very  light  (b=0. 1)  to  moderately  heavy 
(b=l.  0),  was  used. 

The  overall  rms  errors  and  percent-correct  statistics  given  in  Table  XXIX 
for  these  experiments  illustrate  (a)  the  quality  of  the  DPS  initial  guess,  (b)  the  im¬ 
provement  of  the  initial-guess  field  that  results  from  the  SAT  corrections  using 
RAOB  data  of  different  density  and  (c)  the  effects  of  using  smoothing  of  different 
intensities. 

In  the  examination  of  the  effects  of  smoothing  on  the  rms  error  statistics 
shown  here  and  in  the  following  tables  we  must  be  aware  of  certain  factors.  Smooth¬ 
ing,  by  its  very  nature,  tends  to  lower  maximum  points,  raise  minimum  points  and 
reduce  gradients.  This  may  or  may  not  improve  the  analysis,  but  the  point  to  be 
made  is  that  the  likelihood  of  encountering  large  errors  in  the  analysis  is  decreased, 
which  therefore  increases  the  chances  of  obtaining  a  lower  rms  error.  It  is 
probable  that  a  better  analysis  is  achieved  only  if  a  corresponding  increase  is  also 
noted  in  the  overall  percent-correct  score  obtained  from  the  contingency  tabic. 
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TABLE  XXIX 

ANALYSIS  VERIFICATION  STATISTICS  USING  RAOB  DATA  ONLY 


Level 

(mb) 

Dens  ity 

Smoothing 

(b) 

Overall 

rms  error 

Overall 
%  correct 

IGDPS 

unmodified 

4.  26 

39.  0 

Low 

0.  1 

4.  02 

41 . 9 

1.  0 

3.  61 

41.  0 

850 

Medium 

0.  1 

3.  67 

46.4 

0.  5 

3.  37 

47.2 

High 

0.  1 

3.  17 

53.  9 

0.  5 

2.  95 

55.  0 

IGDPS 

unmodified 

5.34 

35.  9 

0.  1 

5.  00 

38.  7 

Low 

0.  5 

4.71 

37.  9 

700 

1.  0 

4.  65 

38.3 

Medium 

0.  1 

4.  58 

44.  7 

0.  1 

3.  96 

50.  6 

High 

0.  5 

3.  71 

51.  5 

1.  0 

3.  71 

48.  6 

IGDPS 

unmodified 

5.  28 

43.  7 

0.  1 

4.  91 

44.  8 

Low 

0.  5 

4.  52 

45.  1 

500 

1.  0 

4.41 

45.  8 

4.34 

0.  1 

50.  6 

Medium 

0.  5 

3.  94 

52.  6 

1.  0 

3.  83 

52.4 

High 

0.  1 

0.  5 

3.  85 

3.48 

58.4 

59.  1 
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The  statistics  in  Table  XXIX  illustrate  the  following  two  points: 

(a)  The  quality  of  the  final  analysis  is  highly  dependent  on  the  quality  of  the 
initial-guess  field.  Only  when  a  dense  network  of  RAOBS  (found  in  very  few  regions 
of  the  Northern  Hemisphere)  were  used,  did  the  improvement  from  the  initial  guess 
result  in  a  large  reduction  of  the  rms  error  and  a  large  increase  in  pereent-corrcct 
score. 

For  example,  at  500  mb,  with  a  dense  network  of  RAOBS  and  using  moderate 
smoothing,  the  rms  error  was  reduced  from  5.28  to  3.48  and  the  percent  correct 
increased  from  43.7  to  59.  1. 

(b)  More  severe  smoothing,  while  nearly  always  lowering  the  rms  error, 
will  frequently  result  in  a  lower  contingency  table  percent-eorrcet  score.  This  is 
particularly  true  of  higher  data  densities,  and  is  most  vividly  illustrated  at  the 
700-mb  level  where,  with  a  high  density  of  RAOBS,  an  increase  in  degree  of 
smoothing  from  b=0.5  to  b=1.0  resulted  in  the  rms  error  remaining  unchanged,  but 
the  percent-correct  score  decreasing  from  51.5  to  48.  6. 

The  primary  purpose  of  the  humidity  analysis  developmental  testing  was  to 
determine  the  effects  of  introducing  diagnostic  data  into  the  analysis.  The  results 
shown  in  Table  XXIX  merely  give  an  indication  of  the  improvements  that  arc  ob¬ 
tained  as  more  dense  RAOB  data  are  used  to  correet  the  initial  guess,  as  well  as 
the  effects  of  using  various  degrees  of  smoothing  after  each  SAT  eorreetion.  The 
remainder  of  this  section  will  be  devoted  to  the  utilization  of  the  diagnostic  data  in 
the  analysis.  For  comparative  purposes,  reference  will  be  made  to  Table  XXIX. 

The  various  relative  weighting  matrices  that  were  used  to  modify  the  SAT 
corrections  made  by  RAOB  and  diagnostic  data  are  given  in  Table  XXX.  In  all  but 
two  relative  weighting  types,  all  radiosonde  data  are  weighted  one.  In  two  types 
(I  and  K) ,  RAOBS  are  excluded  from  the  analysis  to  observe  the  effeets  that  result 
when  only  diagnostic  data  are  used.  Only  decision-tree  diagnostic  data  (in  addition 
to  the  RAOB  data)  are  used  in  relative  weighting  types  B  and  C.  Only  REEP  diag¬ 
nostic  data  are  used  in  type  E.  In  type  J,  REEP  data  having  a  category  probability 
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TABLE  XXX 

RELATIVE  WEIGHTING  MATRIX  TYPES 


Type  A  Type  B  Type  C 


0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

0 

0 

.  5 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

0 

0 

.  5 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

0 

0 

.  5 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

0 

n 

.  5 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

0 

0 

.  5 

Type  D 

Type  E 

Type  F 

4 

.  5 

.  6  . 

.  6 

1 

.  5 

.  5 

.  5 

0 

1 

.  5 

.  5 

.  G 

.  6 

3 

.  4 

.  5  . 

.  5 

1 

.  5 

.  5 

.  5 

0 

1 

.  4 

.4 

.  5 

.  5 

0 

.  3 

.4  . 

,  4 

1 

.  5 

.  5 

.  5 

0 

1 

.3 

.  3 

.4 

.4 

0 

0 

.  3  . 

.  3 

1 

.  5 

.  5 

.  5 

0 

1 

.3 

.  3 

.3  , 

.  3 

0 

0 

.  3  . 

.  3 

I 

.  5 

.  5 

.  5 

0 

1 

.  3 

.  3 

.3 

.  3 

Type  G 

Type  H 

Type  I 

5 

.  G 

.7  . 

.  G 

I 

.  8 

.  8 

.  8  , 

,  G 

1 

.  5 

.  G 

.7 

.  G 

5 

.  G 

.  7  . 

5 

1 

.  8 

.  8 

.  8  . 

.  5 

1 

.  5 

.  6 

.7 

.  5 

5 

.  G 

.  7  . 

.  3 

1 

.  7 

.  7 

.  8  , 

.  3 

1 

.  5 

.  6 

.  7 

.  3 

5 

.  G 

.7  . 

.  3 

1 

.  6 

.  6 

.7 

.  2 

1 

.  5 

.  G 

.  7 

.  3 

5 

.  G 

.  7  . 

,  3 

I 

.  6 

.  G 

.  7 

.  2 

1 

.  5 

.  6 

.  7 

.  3 

Type  J 

Type  K 

0 

.  8 

.  8  . 

,  6 

1 

.  4 

.  5 

.  6  . 

.  G 

0 

0 

.  8 

.  8  . 

,  5 

1 

.  3 

.  4 

.  5  . 

.  5 

0 

0 

.  7 

.  8  . 

,  3 

1 

0 

.  3 

.4  . 

.  4 

0 

0 

.  6 

.7  . 

,2 

1 

0 

0 

.  3  . 

.  3 

0 

0 

.  6 

.7  . 

,  2 

1 

0 

0 

.  3  , 

.  3 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 


78 


of  a  less  than  0.G0  are  excluded.  The  remaining  relative  weighting  types  use  RAOB 
decision-tree  diagnoses,  and  KEEP  diagnoses  with  variations  oeeuring  in  the  rela¬ 
tive  weight  given  to  corrections  made  by  the  diagnostic  data. 

The  overall  final-pass  (after  three  SAT  corrections)  rms  error  and  percent 
correct  contingency  table  scores  at  850,  700  and  500  mb  are  given  for  the  various 
relative  weighting  types  in  Table  XXXI.  The  data  density  simulated  and  the  degree 
of  smoothing  applied  is  indicated  for  each  experiment.  In  each  analysis  three  SAT 
corrections  were  made,  with  influence  radii  of  2.0,  and  1.5  and  1.0. 

Only  a  limited  number  of  analyses  of  850-mb  DPS  were  made.  Humidity  diag¬ 
nosis  and  analysis  testing  is  perhaps  least  interesting  at  this  level  because  of  the 
predominance  of  observed  moist  conditions  and  the  fact  that  the  diagnostic  data  is 
almost  entirely  limited  to  decision-tree  diagnoses  (moist). 


TABLE  XXXI 

ANALYSIS  VERIFICATION  STATISTICS  USING  RAOB  AND  DIAGNOSTIC  DATA 


Level 

(mb) 

Density 

Relative 

weighting 

type 

Smoothing 

(b) 

Overall 
final  pass 
rms  error 
(  C) 

Overall 
final  pass 
%  correct 

850 

Low 

B 

0.  1 

3.72 

47.0 

B 

1.  0 

3.  51 

40.  3 

C 

0.  1 

3.  72 

45.  2 

D 

0.  1 

3.  08 

43.  8 

G 

0.  1 

3.  08 

43.  7 

850 

Medium 

B 

0.  1 

3.  55 

48.  9 

B 

0.  5 

3.40 

48.4 

C 

0.  1 

3.  53 

47.0 

D 

0.  1 

3.  51 

40.  1 

G 

0.  1 

3.49 

40.  3 

. 
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TABLE  XXXI  (Continued) 


Level 

(mb) 

Density 

Relative 

weighting 

type 

Smoothing 

(b) 

Overall 
final  pass 
rms  error 
(°C) 

Overall 
final  pass 
%  cori'ect 

700 

Low 

B 

0.  1 

5.  18 

39.  2 

C 

0.  1 

4.  96 

39.4 

D 

0.  1 

4.  90 

40.  2 

E 

0.  1 

4.95 

39.  6 

G 

0.  1 

4.  89 

40.  1 

G 

0.  5 

4.  66 

40.  1 

G 

1.  0 

4.  62 

40.  0 

II 

0.  1 

4.  92 

39.  7 

I 

0.  1 

5.  12 

38.  5 

I 

1.  0 

4.  84 

38.  2 

J 

0.  1 

5.  00 

40.  1 

700 

Medium 

C 

0.  1 

4.  63 

42.  5 

E 

0.  1 

4.44 

44.  9 

D 

0.  1 

4.  53 

43.  1 

G 

0.  1 

4.  51 

43. 3 

500 

Low 

B 

0.  1 

5.  19 

42.  6 

C 

0.  1 

4.  78 

46.  3 

D 

0.  1 

4.  77 

47.  0 

D 

0.  5 

4.40 

48.  2 

D 

1.  0 

4.  30 

48.  2 

E 

0.  1 

5.  03 

44.  3 

F 

0.  1 

4.  79 

46.  3 

G 

0.  1 

4.  82 

46.  1 

H 

0.  1 

4.  89 

44.9 

H 

0.  5 

4.  51 

46.  1 

K 

0.  1 

4.99 

45.1 

K 

1.  0 

4.  23 

46.  7 

500 

Medium 

B 

0.  1 

4.  79 

46.  9 

C 

0.  1 

4.41 

49.9 

D 

0.  1 

4.  37 

51.  0 

E 

0.  1 

4.  47 

49.2 

F 

0.  1 

4.  39 

50.  8 

F 

0.  5 

4.  01 

50.  7 

F 

1.  o 

3.  95 

50.  2 

G 

0.  1 

4.42 

50.  5 

| 

I 

0.  1 

4.93 

45.  4 
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It  would  he  expected  that,  with  sparse  or  intermediate  (low  or  medium)  density 
conditions,  an  analysis  with  RAOBS  only  w'ould  be  more  likely  to  produce  larger 
positive  errors  than  negative  errors  (that  is,  analyzed  DPS  would  be  too  high)  simply 
because  the  "correct"  (verification)  analysis  is  generally  moist  (low  DPS).  The 
addition  of  diagnostic  data  largely  eliminates  these  positive  errors  and  therefore 
yields  improved  verification  statistics.  It  is  seen  from  comparing  Tables  XXIX 
and  XXXI  that  the  improvement  is  greatest  in  the  sparse-data  simulation  with  all 
decision-tree  diagnostic  data  being  weighted  equal  to  RAOBS  and  no  REEP  diagnostic 
data  included  (type  B).  Under  these  conditions,  and  with  light  smoothing  used 
(b=0.1),  the  introduction  of  the  diagnostic  data  into  the  analysis  results  in  an 
improvement  in  final-pass  mis  errors  from  4.  02°  C  (RAOB  only)  to  3.72°C,  and 
percent  correct  of  41.9  (RAOB  only)  to  47.0.  If  strong  smoothing  is  used  with 
the  same  relative  weighting  type,  the  final-pass  rms  error  is  again  reduced  but 
the  percent-correct  score  also  decreases  slightly.  If  the  decision-tree  diagnoses 
arc  weighted  one  half  of  RAOBS  (type  C) ,  or  if  REEP  diagnostic  data  is  introduced 
into  the  analysis  and  the  relative  weight  given  decision-tree  diagnoses  reduced 
(type  D  and  G),  the  percent-correct  scores  are  still  lower.  The  above  comments 
apply,  in  general,  to  the  medium-data  simulation  also,  except  that  the  improve¬ 
ments  obtained- using  decision -tree  diagnostic  data  are  more  modest.  The  final 
pass  rms  error  improves  from  3.  673C  to  3.  55°  C,  and  the  percent  correct  from 
4G.4  to  48.9,  when  the  relative  weighting  type  is  B  and  a  light  smoothing  (b-0.  1)  is 
applied. 

At  the  700-mb  level  a  large  variety  of  relative  weighting  types  were  used, 
particularly  for  the  simulation  of  sparse-data  conditions.  Unlike  the  results  obtained 
at  850  mb,  the  use  of  type  B  at  700  mb  did  not  produce  the  best  verification 
statistics.  In  fact,  again  referring  to  both  Tables  XXIX  and  XXXI,  it  is  seen  that 
with  a  sparse-data  simulation  and  light  smoothing  (b=0.1),  the  rms  error  increases 
from  5.00°C  to  5.18°C.  It  seems  clear  that  the  use  of  decision-tree  diagnostic 
data  only,  weighted  equivalent  to  a  RAOB,  produces  an  analysis  that  is  too  moist. 

If  the  decision-tree  diagnostic  data  is  weighted  one-half  that  of  RAOB  (type  C) 
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the  rms  error  is  reduced  to  4.9G.  The  test  verification  statistics  are,  however, 
obtained  when  teth  decision-tree  and  REEP  data  are  included  in  the  analysis. 

Recall,  from  Tables  XXII  and  XXIII,  that  700-mb  REEP  diagnoses  are  teth  moist 
and  dry,  and  that  the  ratio  of  REEP  diagnosis  to  decision-tree  diagnosis  is  much 
higher  at  the  700-mb  level  than  at  the  850-mi)  level.  About  the  same  score  is 
obtained  when  the  REEP  diagnoses  are  weighted  relatively  lightly  (type  D)  or 
moderately  (type  G).  A  feature  common  to  teth  relative  weighting  types  (See 
Table  XXX)  is  that  in  areas  where  the  density  of  RAOB  and  diagnostic  data  is 
relatively  high  (p>4),  the  decision-tree  data  is  weighted  less  than  half  the  RAOB 
data.  The  reasoning  behind  this  is  that  within  a  large  area  of  simulated  sparse  or 
intermediate  density,  small  regions  of  relatively -high  data  density  result  primarily 
from  the  identical  or  similar  decision -tree  diagnoses  being  made  at  all  or  most  of  the 
available  surface  stations  within  the  limited  region.  While  this  tends  to  increase 
the  reliability  of  the  diagnosis,  it  also  results  in  the  introduction  of  redundant 
information  into  the  analysis.  If  a  fairly  high  relative  weight  (one-half  or  greater) 
is  given  to  each  individual  SAT  correction  resulting  from  the  decision-tree  diagno¬ 
ses  in  these  high-density  regions,  an  over -correction  is  quite  possible.  This 
apparently  occurs  when  types  B  or  C  are  used.  It  is  interesting  to  note  again  that, 
with  type  G  used,  the  percent-correct  score  remains  almost  constant  as  smoothing 
varies  from  b-0.1  to  b=T.O.  The  rms  error  score,  on  the  other  hand,  decreases 
from  4.  89°C  to  4.  62°  C. 

Some  additional  experiments  that  were  attempted  were  to  use  RAOB  data  with 
REEP  diagnoses  only  (type  E);  to  give  high  relative  weight  to  REEP  diagnoses 
(type  H);  and  to  exclude  REEP  diagnostic  data  having  a  probability  of  occurrence 
less  than  .GO  (type  J).  These  variations  did  not  improve  the  verification  statistics. 
Finally,  diagnostic  data  given  a  relative  weight  identical  to  type  G  were  introduced 
into  the  analysis  with  all  RAOB  data  excluded  (type  I).  Here,  we  obtain  a  measure 
of  the  improvement  in  the  analysis,  over  the  IGDPS  field, that  can  be  achieved  using 
only  diagnostic  data,  and  to  a  certain  extent,  we  simulate  an  ocean  region  where 
no  upper-air  data  are  available,  but  surface  ship  reports  are.  Under  these  conditions, 
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and  using  smoothing  of  b=0. 1,  the  rms  error  of  the  initial  guess  field  is  decreased 
from  5.34°C  to  5.12°C,  and  the  percent  correct  score  increased  from  35.9  to 
38.5.  When  only  radiosonde  data  of  sparse  density  were  used  (See  Table  XXIX) 
with  equivalent  very  light  smoothing,  the  final  pass  rms  error  was  reduced  to 
5.00°C  and  the  percent  correct  increased  to  38.7.  Thus,  in  the  data-sparse 
simulation  at  700  mb,  considering  the  percent-correct  improvement  in  particular, 
the  diagnostic  data  alone  improve  the  initial  guess  about  the  same  extent  as  does 
the  use  of  RAOB  data  only.  When  strong  smoothing  (b=1.0)  was  applied  with 
type  I  being  used,  the  rms  error  decreased  to  4.84°C,  but  the  percent-correct 
score  also  decreased  to  38.2. 

In  the  experiments  conducted  with  a  simulation  of  intermediate  data  density 
at  700  mb,  no  measureable  improvement  in  the  verification  statistics  was  obtained 
with  the  introduction  of  diagnostic  data  into  the  analysis .  When  relative  weighting 
types  E,  D,  and  G  were  employed  with  light  smoothing,  a  small  reduction  in  rms 
error  is  noted,  but  the  percent-correct  score  is  either  lower  or  remains  about  the 
same  when  compared  to  that  obtained  using  RAOB  data  only. 

The  results  obtained  at  the  500-mb  level  were,  in  general,  quite  similar  to 
those  of  the  700-mb  level.  For  a  data-sparse  simulation  and  using  type  B,  with 
light  smoothing,  the  rms  error  increased  (from  using  RAOB  data  only)  from 
4.91°C  to  5.19°C,  and  the  percent  correct  decreased  from  44.8  to  42.6.  An 
improvement  is  noted  with  type  C,  but  the  best  verification  statistics  are  again 
obtained  with  type  D,  when  light  smoothing  is  employed.  The  rms  error  is 
reduced  to  4.77°C  and  the  percent  correct  increased  to  47.0.  Moderate  smoothing 
(b=0.5)  further  low'ers  the  rms  error  to  4.40°C  and  raises  the  percent  correct  to 
48.2.  Application  of  heavier  smoothing  (b=1.0)  fails  to  increase  the  percent-correct 
score.  Three  other  types  of  data  weight  corrections  (relative  weighting  types  F,  G 
and  H)  were  attempted  in  which  the  REEP  diagnostic  data  are  given  greater  relative 
weight.  The  verification  statistics  were,  for  each,  inferior  to  those  obtained  with 
type  D.  This  was  also  true  when  type  E  (RAOBS  and  REEP  diagnostic  data  only) 
was  used.  Finally,  the  identical  corrections  as  in  type  D  were  used  without  RAOB 
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data  (type  K).  With  strong  smoothing,  an  rms  error  of  4.23°C,  and  a  percent- 
correct  score  of  46.7,  were  obtained.  With  strong  smoothing  (b=1.0)  and  using 
RAOB  data  only,  an  rms  error  of  4.41°C  and  a  percent-correct  score  of  45.8 
were  obtained  (see  Table  XXIX).  The  use  of  diagnostic  data  alone  considerably 
improves  the  IGDPS  scores  (rms  error^S. 28°C  and  percent  correct  =  43.7)  and 
the  improvement  is  greater  than  that  realized  when  only  RAOB  data  alone  is 
available  to  the  analysis.  The  experiments  in  which  an  intermediate  (medium) 
data  density  was  simulated  at  500  mb  led  to  the  conclusion  that  the  addition  of 
diagnostic  data  to  the  analysis  is  not  warranted  for  this  data  density.  A  comparison 
of  the  pertinent  sections  of  Tallies  XXIX  and  XXXI  shows  that,  for  a  given  degree 
of  smoothing,  the  use  of  diagnostic  data  in  the  analysis  failed  to  improve  the 
verification  statistics  obtained  with  RAOBS  alone,  regardless  of  the  relative  weight¬ 
ing  type  used. 

A  number  of  experiments  were  conducted  at  700  mb  in  which  the  method  of 
obtaining  the  initial  guess  to  be  used  in  the  analysis  varied.  All  previous  analysis 
results  summarized  in  Tables  XXIX  and  XXXI  were  obtained  using  an  IGDPS  (12-hr 
trajectory  forecast  of  CPS  converted  to  DPS)  initial  guess.  Two  other  types  of 
initial-guess  fields  are  possible  and  the  methods  of  obtaining  them  were  described 
in  detail  in  Section  VI.  In  one  type,  the  IGDPS  field  is  modified  using  REEP  non¬ 
occurrence  diagnostic  data.  In  the  second  type,  the  initial  guess  is  generated  by 
an  averaging  procedure  from  the  data  to  be  used  in  the  analysis.  The  rms  error 
and  percent-correct  scores  given  in  Table  XXXII  are  those  of  the  "final"  initial 
guess,  that  is,  those  grid-point  values  of  DPS  upon  which  the  first  SAT  correction 
is  applied.  It  is  seen  immediately  that  the  use  of  non-occurrence  REEP  data  to 
modify  the  IGDPS  grid  field  results  in  very  little  change  in  the  rms  errors  or 
percent-correct  scores .  The  relatively  limited  number  of  this  type  of  diagnostic 
data  available  after  simulating  sparse  or  medium  data  densities  is  apparently  res¬ 
ponsible  for  this  result  (see  Tables  XXII  for  the  total  number  of  these  diagnoses). 

On  the  other  hand,  the  use  of  a  generated  initial  guess  with  an  intermediate  data 
density  simulation  raises  the  percent-correct  score  from  35.5  to  40.6  (for  RAOBS 
only)  and  to  40.7  (for  RAOBS  and  diagnostic  data).  The  corresponding  changes  in 
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TABLE  XXXII 

ANALYSIS  VERIFICATION  STATISTICS  WITH  DIFFERENT  INITIAL-GUESS  FIELDS 


Level 

(mb) 

Density 

Initial  guess 

Overall 
initial  guess 
rms  error  (°  C) 

Overall 
initial  guess 
%  correct 

700 

— 

IGDPS 

5.34 

35.  9 

Low 

Modified  IGDPS 

5.31 

35.  9 

Medium 

Modified  IGDPS 

5.  29 

35.  5 

Medium 

Generated 
(RAOB  only) 

5.41 

40.  6 

Generated 

(RAOB  &  diagnostic) 

5. 18 

40.  7 

rms  error  are  from  5.34°C  to  5.41°C  and  5.18°C.  It  would  seem  from  the  above 
percent-correct  scores  that  the  use  of  a  generated  initial  guess,  at  least  in  areas 
of  moderate  data  density,  is  desirable.  This  is,  however,  not  actually  the  case. 

An  initial-guess  field  that  has  been  generated  by  an  averaging  procedure  represents 
essentially  a  very  smoothed  "fit"  of  the  analysis  data.  Subsequent  application  of 
the  SAT  correction  procedure  will  result  in  only  limited  modification  of  the  initial 
guess,  particularly  when  the  number  of  corrections  is  limited  (as  is  the  case  in 
data -sparse  simulation  or  when  using  RAOBS  only  in  intermediate  data  simulation). 
Thus,  when  two  identical  analyses  were  preformed  with  RAOBS  only,  the  final-pass 
statistics  when  the  IGDPS  initial  guess  was  used  (4.58°C  and  44.7)  were  consid¬ 
erably  better  than  those  obtained  if  the  initial  guess  was  generated  (5.36°C  and 
41.4).  If  diagnostic  data  are  included  and  an  identical  analysis  procedure  and 
relative  weighting  type  used,  the  differences  are  very  small,  but  the  final  -pass 
results  obtained  using  a  generated  initial  guess  were  slightly  inferior. 

A  series  of  experiments  were  conducted  in  which  analyses  were  performed 
only  over  the  western  half  of  the  grid  (western  Europe) .  The  data  on  tins  half  of 
the  grid  contains  two  important  characteristics:  (a)  the  IGDPS  data  is  derived  from 
12-hr  CPS  trajectories  that  originated  in  many  instances  over  the  eastern  Atlantic 
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Ocean,  and  (b)  the  distributions  of  RAOBS  and  surface  stations  is  quite  uneven, 
because  large  bodies  of  water  (eastern  Atlantic  Ocean,  North  Sea,  Norwegian 
Sea,  Baltic  Sea,  and  Bay  of  Biscay)  are  located  on  the  western  edge  of  the  grid. 

Table  XXXIII  gives  die  initial -guess  and  final -pass  rms  errors  and  percent- 
correct  scores  for  the  700-mb  level  for  the  sparse  data  (low  density)  simulation. 

It  is  seen  immediately  that  the  IGDPS  initial  guess  is  less  accurate  in  this  region 
than  in  the  entire  area.  The  rms  error  is  6.25°C  compared  with  5.34°C  for  the 
entire  area,  while  the  associated  percent-correct  scores  are  29.7  and  35.9  respec¬ 
tively.  Type  G,  one  of  the  two  relative -weighting  matrices  that  proved  most  use¬ 
ful  in  weighting  diagnostic  data  at  700  mb,  for  SAT  corrections  within  the  entire 
area,  was  used  in  the  limited  western  area.  With  light  smoothing  (b-0.1),  the 
rms  error  of  the  initial  guess  is  lowered  to  5.61°C,  and  the  percent-correct  score 
is  35.7.  With  heavy  smoothing  the  rms  error  is  5.15°C,  and  the  percent  correct, 
36. 6.  Unlike  the  results  that  were  obtained  when  the  entire  area  v'as  considered 
(see  Table  XXXI),  the  application  of  heavy  smoothing  with  the  type  G  data  weight 
correction,  and  lowr  density  simulation,  did  improve  the  verification  statistics. 

This  result  is  probably  a  reflection  of  the  fact  that  the  initial  guess  is  of  low'er 
quality,  and  the  data  used  in  the  analysis  are  distributed  more  irregularly.  It 
should  also  be  noted  from  Table  XXXDI  that,  for  the  same  smoothing,  the  addition 
of  diagnostic  data  to  the  analysis  improved  the  rms  error  and  percent-correct 
scores  over  those  obtained  using  RAOBS  only.  The  use  of  a  generated  initial  guess 
with  DWC  type  G  did  not  improve  the  final-pass  statistics.  Considering  the  irregular 
data  distribution,  it  is  not  surprising  that  obtaining  an  initial  guess  by  an  averaging 
procedure  fails  to  improve  the  analysis. 

A  number  of  additional  experiments  were  performed,  the  results  of  which 
have  not  been  given  in  previous  tables.  The  rms  error  and  contingency  table 
percent -correct  scores  did  not  indicate  any  marked  improvement  in  the  analysis. 

The  types  of  experiments  were:  (a)  increasing  the  size  of  the  influence  radius  of 
the  first  SAT  correction  to  three  NWP  grid  intervals;  (b)  performing  four  instead 
of  three  SAT  corrections;  (c)  smoothing  the  initial  guess,  as  well  as  after  each 
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TABLE  XXXIII 

ANALYSIS  VERIFICATION  STATISTICS  FOR  WESTERN  AREA 
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SAT  correction;  and  (d)  using  a  generated  initial  guess  over  the  entire  test  area  with 
sparse  data  simulation. 

Up  to  this  point  the  discussion  has  been  restricted  to  summary  error  statistics 
obtained  over  all  22  observation  times  used  in  the  developmental  testing.  It  is, 
however,  often  interesting  and  instructive  to  examine  individual  observation  times. 

A  set  of  maps  at  the  500-mb  level  are  shown  in  Fig.  22  (a)-(d)  and  in  Fig.  23  (a)-(d). 
For  each  figure  the  maps  are  (a)  verification  or  ''true’'  analyses;  (b)  IGDPS  (initial 
guess);  (c)  final -pass  analysis  using  RAOB  data  only  for  a  sparse-data  density 
simulation  and  light  smoothing  (b=0.1);  and  (d)  same  as  (c)  except  that  diagnostic 
data  is  added  to  the  analysis  with  type  D  weighting.  In  shaded  areas  the  DPS  is 
above  12°C.  M  indicates  areas  of  low  DPS.  Isopleths  of  DPS  are  drawn  for  every 

The  isopleths  were  drawn  subjectively  (all  by  the  same  analyst),  but  arc  based 
on  the  values  of  DPS  obtained  at  grid  points  from  the  objective  SAT  analysis.  The 
observation  times  are  00Z  Feb.  11,  and  one  day  later,  00Z  Feb.  12,  1962.  The 
dates  were  selected  to  show  instances  when  the  addition  of  diagnostic  data  results  in 
an  improved  analysis. 

The  main  features  of  the  DPS  verification  analysis  (Fig.  22(a))  for  00Z  Feb.  11, 
1962  are: 

(a)  a  small  area  of  large  DPS  (maximum  23°C)  in  the  southwest  cor¬ 
ner  of  the  grid, 

(b)  an  area  of  moderately  high  DPS  extending  north-south  just  west  of 
the  center  of  the  analysis  area,  with  two  distinct  centers  of  DPS  maximum  of 
15°C , 

(c)  a  broad  area  of  high  DPS  over  much  of  the  eastern  third  of  the 
grid,  with  maximum  values  again  of  15° C,  and 

(d)  only  limited  regions  where  the  DPS  is  moist  (4°C  or  less);  the 
two  most  noticeable  of  which  extend  north-south  in  the  western  portion  of  the 
area  and  occupy  a  small  area  on  the  northern  edge  of  the  grid  just  east  of 
center . 
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Fig.  22.  500-mb  DPS  analyses  for  00Z  February  11,  1962:  (a)  500-mb  verification  analyses,  (b)  500-mb  IGDPS, 
(c)  500-mb  RAOB  only,  (d)  500-mb  RAOB  and  diagnostic  data. 
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Fig.  23.  500-mb  DPS  analyses  for  00Z  February  12,  1962.  (a)  500-rnb  verification  analyses,  (b)  500-mb  IGDPS, 

(c)  500-mb  RAOB  only,  (d)  500-mb  RAOB  and  diagnostic  data. 


The  IGDPS  initial  guess  [Fig.  22(b)]  shows  only  fair  correspondence  with  the 
verification  analysis.  The  major  region  of  high  DPS  extends  mainly  east-west  from 
the  south-west  eorner  toward  the  south-eentral  part  of  the  analysis  area.  Two  of 
the  maxima  described  in  (a)  and  (b)  have  essentially  been  combined.  The  broad 
area  of  high  DPS  in  the  eastern  regions  compares  well  with  the  verification  analysis, 
except  that  the  grid-point  values  average  about  4°C  too  low.  The  IGDPS  is  most 
deficient  in  that  the  northern-most  of  the  two  maxima  described  in  (b)  is  replaced 
by  an  area  of  low  DPS,  and  in  place  of  the  southern  half  of  the  north -south  DPS 
trough  listed  in  (d)  there  is  a  region  of  maximum  DPS. 

The  sparse-data  RAOB-only  analysis  given  in  Fig.  22(e)  has  some  serious 
deficiencies.  Generally  speaking,  the  northern  half  of  the  analysis  is  too  moist. 

The  eenter  of  the  region  of  high  DPS  in  the  east  (c)  is  analyzed  too  far  south  and 
the  value  of  DPS  is  too  high.  The  northern-most  of  the  two  maxima  described  in 
(e)  does  not  appear  in  the  analysis  at  all.  The  area  of  maximum  DPS  in  the  south¬ 
west  is  too  small  and  the  values  too  low. 

A  glance  at  Fig.  22(d),  the  DPS  analysis  obtained  when  both  radiosonde  and 
diagnostic  data  are  used,  shows  that  the  location  and  size  of  regions  of  large  DPS 
more  nearly  correspond  to  the  verification  analysis.  The  exception  is  the  northern¬ 
most  of  the  two  maxima,  described  in  (b).  The  one  feature  of  the  RAOB-only 
analysis  that  is  superior  (although  faulty)  to  the  analysis  with  diagnostic  data  is 
the  depletion  of  the  north -south  DPS  trough  in  the  western  portion  of  the  grid.  In 
general,  it  ean  be  said  that  most  of  the  improvement  to  the  analysis  resulted  from 
the  addition  of  deeision-tree  and  REEP  diagnoses  of  dry  conditions  to  a  radiosonde- 
only  analysis  that  was  too  moist. 

A  detailed  diseussion  will  not  be  given  of  the  same  sets  of  500-mb  DPS  grid 
fields  shown  one  day  later  in  Fig.  23(a)-(d).  A  comparison  of  the  two  verifications 
analyses  shows  that  marked  changes  oeeurred  in  the  humidity  distribution  in  24 
hours.  Generally,  more  moist  conditions  prevail.  The  principal  faults  in  the  RAOB- 
only  analysis  is  that  the  values  at  eenters  of  maximum  and  minimum  DPS  are  too 
extreme.  This  defieieney  is  largely  eorreeted  by  the  introduction  of  diagnostic  data 
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into  the  analysis.  Unlike  the  previous  maps  shown,  the  greatest  improvement 
results  from  the  inclusion  of  moist  diagnoses  which  reduce  the  erroneously  high 
DPS  maxima. 

Because  of  the  extent  and  detail  of  the  material  presented  in  this  section  it 
would  seem  worthwhile  to  conclude  with  a  summary  discussion  in  the  next  few  para¬ 
graphs.  First,  a  few  words  of  caution.  The  testing  was  conducted  in  a  limited  area 
of  the  Northern  Hemisphere  (Europe)  for  22  consecutive  observation  times  (11  days) 
in  Feb.  19G2.  Much  of  this  region  is  under  a  winter  maritime  regime  and  the 
characteristics  of  the  DPS  analyses  at  850,  700  and  500  mb  would  be  expected  to 
reflect  any  climateological  bias  associated  with  such  a  regime.  The  extent  and 
persistence  of  high  humidity  at  850  mb,  and  the  relatively  high  percentage  of  surface 
stations  that  yielded  diagnoses  with  particular  cloud  and  weather  types,  reflects 
this  bias.  The  data -density  simulations  represent,  at  best,  an  attempt  to  approx¬ 
imate  characteristic  data  densities  that  are  found  in  the  Northern  Hemisphere.  It 
is  felt  that  in  over  half  of  the  area  of  the  Northern  Hemisphere  data  density  is  similar 
to  that  approximated  in  the  sparse-data  simulation.  One  final  note  of  caution  is  that 
all  analyses  were  performed  on  an  NWP  grid  (381  km  at  G0cN).  Several  of  the  con¬ 
clusions  regarding  analysis  characteristics  (influence  radii,  number  of  SAT  correc¬ 
tions,  degree  of  smoothing)  in  particular,  would  have  to  be  modified  if  a  smaller 
grid  interval  was  used. 

However,  considering  the  present  overall  density  of  RAOBS  in  the  Northern 
Hemisphere,  it  would  be  difficult  to  justify  the  use  of  a  smaller  grid  interval. 

The  quality  of  the  final  DPS  analysis  is  highly  dependent  on  the  quality  of  the 
initial-guess  field.  This  is,  of  course,  particularly  true  in  regions  of  sparse  data 
where  the  effects  of  the  SAT  corrections  on  the  initial  grid-point  values  of  DPS  arc 
limited.  Three  types  of  initial-guess  fields  were  used.  The  initial  guess  obtained 
from  12-hr  trajectory  forecasts  of  CPS  was  less  reliable  at  700  and  500  mb  than 
at  850  mb  and  less  reliable  in  the  western  area  than  the  eastern  area  of  the  grid  at 
all  levels.  The  increase  in  rms  error  at  higher  levels  is  at  least  partly  due  to  the 
greater  variability  of  humidity  at  these  levels.  The  increase  in  error  in  the  western 
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half  of  the  grid  is  associated  with  the  fact  that  the  trajectories  producing  the  initial 
guess  here  originate  from  the  eastern  Atlantic  Ocean,  a  region  of  sparse  data.  The 
modification  of  this  initial  guess  using  REEP  non -occurrence  diagnostic  data  was 
ineffective  because  of  the  limited  number  of  these  diagnoses  (with  sparse  or  medium 
data  density  simulations)  and  also,  perhaps,  to  the  very  conservative  manner  in 
which  the  IGDPS  data  was  adjusted  with  them.  The  use  of  a  generated  initial  guess 
(initial  guess  obtained  by  an  averaging  of  the  data  to  be  used  in  analysis)  did  not 
improve  the  final  analysis  under  sparse  or  medium  data  densities. 

The  limited  testing  of  the  analysis  characteristics  of  the  SAT  correction 
procedure  indicated  that  three  corrections  are  sufficient.  In  fact,  the  improvement 
in  the  analysis  (as  reflected  in  lower  rms  errors  or  higher  percent  correct  scores 
from  the  5  by  5  contingency  tables)  is  often  very  small  between  the  second  and 
third  correction.  The  associated  influence  radii  of  2.0,  1.5,  and  1.0  yield  reason¬ 
able  results.  Moderate  smoothing  (b=0.5)  is  useful  in  regions  of  sparse  data  and 
possibly  also  in  areas  of  medium  data  density.  Otherwise,  very  light  smoothing 
(b=0.1)  should  be  applied. 

The  introduction  of  diagnostic  data  into  the  analyses  resulted  in  improved 
verification  scores  at  all  levels  with  a  sparse  data  density  simulation.  For  inter¬ 
mediate  data  density,  the  effect  on  the  verification  statistics  was  negligible  at  700 
and  500  mb,  while  some  improvement  was  noted  at  the  850-mb  level.  This  small 
deviation  in  results  at  850  mb  is  probably  linked  to  the  sample  characteristics.  In 
general,  the  improvements  in  rms  error  and  percent-correct  score  are  relatively 
small,  but  probably  significant  when  compared  to  the  improvements  that  result  when 
RAOBS  of  intermediate  or  high  density  are  used  in  the  analysis  with  the  identical 
initial  guess.  The  testing  of  different  relative  weightings  given  to  the  diagnostic 
data  indicated  that  the  density  of  all  data  in  a  local  region,  as  well  as  the  type  and 
reliability  of  the  diagnostic  data,  should  be  accounted  for. 

At  700  and  500  mb,  the  best  results  were  obtained  when  relative  weighting 
type  D  was  used  to  weight  the  diagnostic  data.  The  important  features  of  this  data- 
w'eight  correction  are:  (a)  RAOB  data  is  given  full  weight;  (b)  decision-tree 
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diagnoses  are  weighted  less  than  half  (.3  or  .4)  in  local  regions  where  the  data 
density  is  high  (primarily  caused  by  repetitive  decision-tree  diagnoses  —  redun¬ 
dant  information),  and  half  or  greater  (.5  or  .6)  in  local  regions  of  sparse  data; 
and  (c)  REEP  diagnoses  with  a  probability  of  occurrence  P  >.70  are  weighted  the 
same  as  decision -tree  diagnoses,  while  REEP  diagnoses  with  a  lower  P  are 
weighted  less.  At  850  mb,  REEP  diagnoses  are  unimportant  because  of  their 
infrequency  in  the  test  area  and  time  period.  The  best  verification  scores  were 
obtained  by  weighting  the  decision-tree  diagnoses  the  same  as  RAOBS.  In 
experiments  at  700  and  500  mb,  with  data  sparse  conditions  being  simulated,  it 
was  found  that  the  use  of  only  diagnostic  data  improved  the  verification  statistics 
of  the  initial  guess  as  much  or  more  than  did  the  use  of  RAOBS  only.  In  areas  where 
no  RAOBS  are  available,  application  of  strong  smoothing  (b=1.0)  may  be  desirable. 
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SECTION  vm 


CONCLUSIONS  AND  RECOMMENDATIONS 


A  technique  to  diagnose  dew-point  spread  (DPS)  at  850,  700,  500  and  400  mb, 
using  Northern  Hemisphere  surface-synoptic  data,  has  teen  developed  for  both  the 
warm  and  cold  [2]  seasons  of  the  year.  The  approach  consists  of  first  isolating, 
within  a  step-wise  decision-tree  framework,  those  individual  surface -observed 
elements  that  yield  highly  reliable  estimates  of  upper-level  humidity.  After  all 
individual,  high-quality  diagnostic  relations  of  this  type  have  been  exhausted,  a 
statistical  technique  (REEP)  is  applied  to  the  remaining  cases  (residual  sample)  to 
derive  equations  that  yield  probabilities  of  occurrence  for  three  categories  of 
moisture.  A  set  of  decision -tree  relations  and  REEP  equations  was  derived  for 
each  of  the  four  constant -pressure  surfaces  (850,  700,  500  and  400  mb)  for  the 
winter  season  [2]  and  for  all  but  the  400-mb  level  for  the  summer  season. 

The  decision-tree  relations  effectively  utilize  those  individual  surface  ele¬ 
ment  types  whose  occurrence  yields  a  highly  reliable  estimate  of  upper-level  DPS. 
The  REEP  technique  selects  surface  element  types  whose  occurrence  or  non¬ 
occurrence,  in  conjunction  with  the  occurrence  or  non -occurrence  of  other  surface 
variables,  yields  information  regarding  the  probabilities  of  occurrence  (by  cate¬ 
gories)  of  upper-level  DPS. 

At  850  and  700  mb  all  decision-tree  diagnoses  are  moist,  while  at  500  and 
400  mb,  both  moist  and  dry  diagnoses  are  made.  Diagnoses  of  both  moist  and  dry 
categories  of  DPS  are  made  with  REEP.  The  reliability  of  a  diagnosis  is  indicated 
by  the  probability  of  occurrence  assigned  to  the  diagnosed  category. 

It  is  concluded  that  the  combined  decision-tree  and  REEP  approach  is  a  logi¬ 
cal  and  fruitful  method  of  diagnosing  upper-level  moisture  from  surface  observations. 
Limited  comparisons  of  the  warm  and  cold  season  decision-trees  using  independent 
data  indicated  that  the  use  of  two  sets  of  relationships  is  useful.  The  percentage  of 
surface  stations  that  will  yield  useful  upper -level  humidity  diagnoses  varies  with 
season,  level,  and  minimum  probability  required  for  a  REEP  diagnosis  to  be  made. 
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However,  this  number  is  generally  between  25  and  50  percent.  Since  there  are  more 
than  five  times  as  many  stations  reporting  surfaee-synoptie  data  as  there  are  re¬ 
porting  radiosonde  observations,  the  diagnostic  technique  described  in  this  report 
provides  a  vast  increase  in  humidity  information. 

The  humidity  diagnoses,  while  useful,  are  of  variable  quality.  It  is  still 
neeessary  to  know  how  to  combine  diagnoses  and  radiosonde  observations  of  humidity 
into  an  analysis  and  how  to  determine  the  effeets  of  the  introduction  of  the  diagnos¬ 
tic  data. 

The  incorporation  of  diagnostic  data  obtained  from  the  eold  season  relation¬ 
ships  into  a  SAT  (sueeessive  approximation  technique)  humidity  analysis  on  an  N\VP 
grid  at  the  85O7  700-and  500-mb  levels  is  tested  using  European  surfaee  and  upper- 
air  data  for  22  eonseeutive  observation  times  in  February  1962.  Various  data 
densities  are  simulated  by  withholding  a  portion  of  both  surface  and  upper-air  data. 
Root-mean -square  errors  and  eontingeney  table  statistics  indicate  that  an  improved 
analysis  is  obtained  in  sparse-data  regions  by  weighting  the  analysis  corrections 
made  by  the  diagnostic  data  relative  to  the  radiosonde  data  corrections.  The  most 
appropriate  weighting  is  a  function  of  the  reliability  of  the  diagnosis  and  the  data 
density  in  the  area  local  to  the  diagnosis.  At  850  mb,  the  addition  of  decision-tree 
data  only  (no  REEP  diagnoses)  most  improved  the  analysis.  At  700  and  500  mb,  the 
combined  use  of  both  decision-tree  and  REEP  diagnoses  was  useful.  The  diagnostic 
data  receives  increasing  relative  weight  as  its  reliability  increases  and  the  density 
of  data  (both  observed  and  diagnosed)  in  the  limited  area  about  the  diagnosis  de¬ 
creases.  In  general,  the  improvements  in  rms  error  and  percent-correet  score  are 
relatively  small,  but  probably  comparable  to  the  improvement  that  would  result  if 
the  number  of  RAOBS  was  increased  from  six  to  nine  or  ten  in  an  area  of  the  size 
used  in  the  developmental  tests.  Considering  the  cost  of  maintaining  weather  ships 
in  data-sparse  ocean  regions,  an  improvement  of  this  magnitude  is  significant. 

The  humidity  analysis  technique  was  tested  within  a  limited  area  and  time 
period.  The  results,  however,  do  indicate  that  in  sparse  data  regions  of  the 
Northern  Hemisphere  (over  half  the  total  area)  it  is  desirable  to  inelude  diagnostic 


data  with  whatever  RAOBS  are  available  in  obtaining  a  humidity  analysis .  Certain 
features  of  the  SAT  technique,  such  as  the  number  of  corrections,  size  of  influence 
radii,  and  degree  of  smoothing  used,  will  effect  the  analysis.  The  testing  indica¬ 
ted  that  the  most  important  of  these  is  smoothing,  and  that  in  data-sparse  regions 
moderate  or  heavy  smoothing  should  be  applied  after  each  SAT  correction. 

It  is  recommended  that  the  following  additional  development  and  testing  be 
performed. 

A  more  complete  comparison  should  be  conducted  between  the  warm-season 
decision  trees  and  REEP  equations  and  the  cold-season  decision  trees  and  REEP 
equations.  It  has  been  suggested  that,  considering  the  developmental  samples, 
the  warm -season  relations  should  be  used  from  July  through  November  and  the 
cold-season  relations  from  November  or  December  through  June.  Testing  for 
selected  months  of  the  year,  using  surface  and  upper-air  data  from  several  re¬ 
gions  of  the  Northern  Hemisphere,  wrould  provide  firmer  guidelines  regarding  the 
use  of  one  or  the  other  set  of  relations. 

An  expanded  testing  of  the  analysis  technique  is  also  advisable,  using  data 
from  a  different  area  and  time  of  the  year  than  was  used  here.  This  study  should 
include  a  careful  examination  of  the  changes  in  error  fields  of  individual  analyses 
that  result  when  diagnostic  data  is  used.  Particular  attention  should  be  directed 
toward  the  distribution  of  data  as  well  as  the  overall  density. 

It  has  been  seen  that  the  quality  of  the  resultant  analysis  is  highly  dependent 
on  the  quality  of  the  initial  guess  in  data-sparse  regions.  Further  effort  should  be 
directed  toward  improving  the  initial  guess  by  either  (a)  more  extensive  modifica¬ 
tion  of  the  initial  guess  using  diagnostic  data,  or  (b)  modification  or  expansion  of  the 
present  GWC  trajectory  prediction  technique.  The  first  suggestion  has  only  limited 
promise  because  the  frequency  of  diagnostic  data  is  limited  in  data-sparse  regions 
and  the  extensive  utilization  of  these  data  to  both  modify  the  initial  guess  and  con¬ 
tribute  to  SAT  corrections  tends  to  be  redundant.  In  the  second  approach,  an  evalua¬ 
tion  should  be  conducted  of  the  predictive  skill  of  the  present  trajectory  technique 
first,  and  then  an  estimate  should  be  made  of  the  likelihood  of  improvement  through 
additional  physical  or  dynamical  modeling. 
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Continued  effort  should  be  directed  toward  the  design  of  an  upper-air  observa¬ 
tion  network  and  associated  transmission  procedures  that  would  result  in  significant 
improvement  in  both  the  horizontal  and  vertical  depiction  of  moisture. 
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APPENDIX 


REDERIVATION  OF  850-mb  COLD-SEASON  REEP  RELATIONSHIPS 

The  cold-season  REEP  equations  used  to  diagnose  850-mb  DPS,  if  a  decision- 
tree  diagnosis  cannot  be  made,  were  rederived:  the  results  presented  here  are  to 
replace  the  equations  given  in  the  earlier  report  [2]  . 

1.  Variables  Selected 

In  the  statistical  evaluation  of  the  850-mb  residual  sample,  3  and  4  categories 
of  850-mb  DPS  were  used  (see  Table  XXX IV). 


TABLE  XXXIV 

850-mb  COLD  SEASON  DPS  CATEGORY  LIMITS 


Category 

Limits  (°  C) 

Category 

Limits  (°  C) 

1 

0  £  DPS  £  4 

1 

0  £  DPS  £  5 

2 

4  <  DPS  £  8 

2 

5  <  DPS  <:  10 

3 

8  <  DPS  <  13 

3 

10  <  DPS 

4 

13  <  DPS 

The  dummy  variables  selected  by  REEP  to  diagnose  3  and  4  categories  of 
850-mb  DPS  (specificand)  are  quite  similar  (see  Table  XXXV).  For  both  specificand 
breakdowns,  the  first  two  dummy  variables  selected  are  dry  and  moist  categories 
of  surface  DPS.  This,  of  course,  simply  reflects  the  strong  positive  correlation 
found  between  moisture  at  the  surface  and  850  mb.  The  third  dummy  variable 
selected,  low-cloud  height  greater  than  8,000  feet  or  no  low  cloud,  contributes 
significantly  to  a  non-occurrence  of  category  1  (moist)  and  an  occurrence  of 
category  3  (dry)  in  the  3-category  specificand  breakdown.  This,  of  course,  is 
consistent  with  meteorological  reasoning. 

2.  Evaluation  of  Results  on  Dependent  and  Independent  Data  (850-mb) 

The  REEP  equations  developed  to  diagnose  4  and  3  categories  of  850-mb  DPS 
were  tested  on  a  dependent  sample  of  5328  cases  and  an  independent  sample  of  1276 
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TABLE  XXXV 

850- mb  RESIDUAL  SAMPLE-SELECTED  VARIABLES 


Order  of 
selection 

Variables  Selected 

4-category  diagnosis 

3-category  diagnosis 

1 

10  <  DPS 

10  <  DPS 

2 

0  £  DPS  £  3 

0  £  DPS  £  3 

3 

8000  £  h  (or  no  low  cloud) 

8000  £  h  (or  no  low  cloud) 

4 

-5  <  T  £  10 
d 

T  £  -15 

5 

G  <  DPS  £  10 

-15  <  T  £  0 

6 

-15  <  T  £  0 

nt  =  0.0 

7 

nt  =  0.0 

6  <  DPS  £  10 

8 

T  £  -15 

ww  =  02 

9 

ww  =  02 

-3.  1  <  app  £  -1.  6 

10 

10  <  DPS  £  17 

-5  <  T  ,  £  10 
d 

11 

0.  1  <  N  £  0.  5 

T 

LO 

II 

O 

12 

-3.  1  <  app  £  -1.6 

- 

13 

15  <  T  £  30 

- 

14 

0  <  T  £  15 

15 

1 .  5  <  app  £  3.0 

- 

cases.  The  contingency  table  results  are  presented  in  Table  XXXVI.  In  both  4  and 
3  category  diagnoses,  the  percent-correct  score  varies  little  from  dependent  to 
independent  data.  The  frequency  of  diagnoses  in  each  category  compares  very  well 
with  the  observed  frequency  in  the  3  category  contingency  table  and  not  as  well  in 
the  4  category  table.  Considering  the  independent  sample  results,  the  percent- 
correct  scores  are  41.  9  and  52.  0  for  4  and  3  category  diagnoses  respectively. 
These  scores  are  only  slightly  lower  (about  1  percent)  than  those  obtained  with 
the  700-mb  cold-season  residual-sample  REEP  equations  [2]. 

A  comparison  of  the  approaches  of  the  decision  tree  plus  REEP  with  REEP 
only  was  made  with  the  rederived  relationships  and  the  only  effect  of  introducing 
the  results  described  here  was  to  emphasize  further  the  superiority  of  the  decision 
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tree  plus  REEP  approach.  The  comparison  of  the  use  of  Boolean  and  non-Boolean 
variables  with  the  850- mb  residual  sample  was  redone  with  the  new  relationships. 

The  results  obtained  were  the  same  as  before;  that  is,  the  use  of  Boolean  variables 
does  not  increase  the  diagnostic  skill  (for  a  detailed  discussion  of  these  comparisons 
the  reader  is  referred  to  the  earlier  report  [2]). 

3.  Recommended  Procedure  at  850  mb  and  Additional  Comments 

Table  XXXVII  contains  the  two  sets  of  coefficients  for  the  REEP  equations 
used  to  diagnose  4  and  3  categories  of  850-mb  DPS.  It  is  recommended  that  the 
3  category  REEP  equations  for  850  mb  be  used  if  a  diagnosis  cannot  be  made  from  the 
the  850-mb  cold-season  decision  tree. 


101 


TABLE  XXXVI 

850-mb  RESIDUAL  SAMPLE  (COLD  SEASON) 
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TABLE  XXXVII 

COEFFICIENTS  OF  REEP  EQUATIONS  (850  mb)* 


Order 

4  Category 

3  Category 

1 

2 

3 

4 

1 

2 

3 

1 

120 

-.  293 

-.090 

.503 

-.  148 

-.  249 

.  397 

2 

.  127 

-.076 

-.041 

-.010 

.  132 

-.  109 

-.022 

3 

172 

.003 

.060 

.  109 

-.  142 

.  015 

.  127 

4 

-.018 

-.  022 

-.011 

.051 

.  130 

.007 

-.  137 

5 

-.055 

-.  124 

.  137 

.042 

-.052 

.  110 

-.058 

6 

-.  129 

-.079 

.031 

.  176 

-.072 

.030 

.042 

7 

-.092 

-.004 

.054 

.041 

-.095 

-.017 

.  113 

8 

-.  022 

-.026 

-.092 

.  140 

-.057 

-.011 

.  068 

9 

-.035 

-.033 

.008 

.060 

-.069 

-.059 

.  128 

10 

.039 

.016 

.  183 

-.  238 

-.045 

-.021 

.067 

11 

-.062 

.008 

.  023 

.  032 

.098 

-.055 

-.042 

12 

-.  068 

-.050 

.005 

.  113 

— 

— 

— 

13 

-.083 

-.091 

-.098 

.271 

— 

— 

— 

14 

-.  104 

-.  143 

-.013 

.261 

— 

— 

— 

15 

.014 

.042 

.005 

-.060 

— 

— 

— 

Additive 

constant 

.473 

.497 

.  174 

-.  144 

.  433 

.370 

.  197 

*See  Table  XXXV  for  variables  selected 
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The  incorporation  of  diagnostic  dati  obtained  from  the  cold  season  relationships  (de¬ 
rived  in  earlier  work)  into  a  humidity  analysis  at  the  850- ,  700-  and  500-mb  levels  is  tested 
using  European  surface  and  upper-air  data  for  22  observation  times  in  February  1962.  Sparse 
data  conditions  are  simulated  by  withholding  a  portion  of  both  surface  and  upper-air  data. 

Rms  errors  and  contingency  table  percent  correct  scores  indicate  that  an  improved 
analysis  is  obtained  by  weighting  the  diagnostic  data  relative  to  the  radiosonde  data.  The  most 
appropriate  weighting  is  a  function  of  the  reliability  of  the  diagnosis  and  the  data  density. 
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